Last time, I built an AR model to predict the future of a time series.
A key element to find out was the order of the model. We picked 7 as that was the “period” of out series.
Anyhow, in the last part of the post, we started having the feeling that, maybe, another order could be better.
For this reason, I’ve tried to explore other ways to select the right order.
First option is to build multiple models with different orders and compute AIC/BIC values. The order that gives the lowest AIC/BIC values might be the right candidate.
This is the code to compute AIC/BIC for multiple orders:
plt.clf()
plt.close()
max_order = 10
aic_values = []
bic_values = []
for p in range(1, max_order + 1):
model = AutoReg(series, lags=p)
result = model.fit()
aic_values.append(result.aic)
bic_values.append(result.bic)
order = range(1, max_order + 1)
plt.plot(order, aic_values, marker='o', label='AIC')
plt.plot(order, bic_values, marker='o', label='BIC')
plt.xlabel('Order (p)')
plt.ylabel('Information Criterion Value')
plt.title('Comparison of AIC and BIC')
plt.savefig('aic_bic_ipps.png')
We can visualize data:
As we can see the lowest value seems order 8. This is confirmed by getting the minimum index in the aic/bic lists (+1):
>>> aic_values.index(min(np.array(aic_values)))+1
8
>>> bic_values.index(min(np.array(bic_values)))+1
8
Another way is to use the ar_select_order function:
>>> sel = ar_select_order(series, 13)
>>> sel.ar_lags
array([1, 2, 3, 4, 5, 6, 7, 8])
>>> sel.ar_lags[-1]
8
Yet another method is to leverage the ljung box test (and the Box Pierce one):
plt.clf()
plt.close()
max_order = 10
lbp_values = []
bpp_values = []
for p in range(1, max_order + 1):
model = AutoReg(series, lags=p)
result = acorr_ljungbox(model.fit().resid, lags=[p], return_df=True, boxpierce=True)
lbp_value = result.iloc[0,1]
lbp_values.append(lbp_value)
bpp_value = result.iloc[0,3]
bpp_values.append(bpp_value)
threshold = 0.05
lb_best_order = lbp_values.index(min(np.array(lbp_values))) + 1
bp_best_order = bpp_values.index(min(np.array(bpp_values))) + 1
print("LB P Values: ", lbp_values)
print("LB Selected Order (p):", lb_best_order)
print("LB Valid order: " + str(min(np.array(lbp_values)) < threshold))
print("BP P Values: ", bpp_values)
print("BP Selected Order (p):", bp_best_order)
print("BP Valid order: " + str(min(np.array(bpp_values)) < threshold))
again, we compute values for multiple orders and we pick the lowest one.
Result is:
LB Selected Order (p): 6
LB Valid order: True
BP Selected Order (p): 6
BP Valid order: True
Result is 6 this time!
This tells us something pretty important. Different methods give us different results. There is no correct order “no matter what”!
We can use all these techniques to get an idea of “good orders”. If the same order always appears, it is a good indication it must be it.
Otherwise, when different values are given as potentially “good orders”, the only thing to do it is try them all and see which model fits reality best.
At the end of the day, we are trying to forecast the future and what do we normally do with our everyday life? Trial and error. This is not so different 🙂
Ciao
IoSonoUmberto