• idriss tsafack

ARMA models with R: the ultimate practical guide with Bitcoin data

Updated: Dec 25, 2020

Hey there ! Welcome back to my blog! Hope you are doing great!

On my previous blog post, titled "Time Series Analysis and Forecasting with R, Beginners' Lessons" , I discussed on a simple introduction on time series with R for beginners. Basically, I presented a simple code that could inspire newbies to start manipulating time series with R. In this new blog post I will dig slightly deeper on the concept of time series. More precisely, I will talk about ARMA models.

What is an ARMA model? What are the main characteristics of this model? How to estimate an ARMA model? How to analyze ARMA models with R? How to specify correctly the ARMA model ?

Before we jump in understanding these concepts, let me first discuss some preliminary points about statistical or machine learning models.

The archetype of machine learning or statistical models

One of the common concerns of statistical or machine learning models is that the usage of random noises is some times important in order to guarantee the goodness of fit of a model. These random noises could come from the transformation of the original data and therefore introduced for modeling purpose as explanatory variables. Also, when we consider the error term of a regression model to be a random noise, it is basically a starting point for considering desirable properties to get a good statistical modeling result.

MA(q) and AR(p) models

Let us consider the bitcoin price (quoted in US Dollars) dynamics at the daily frequency called as Yt. In some real data applications, one could be interested in modeling a considered variable Yt as a function of unknowns variables. In that sense, the variable Yt is considered as a linear combination of independent and identically distributed random noises. This procedure can be considered as a filtering procedure. In the context of time series analysis, this is called the Moving Average model or MA model. We can write Y(t) as follows:

Y(t) = b0 + b1e(t-1) + b2e(t-2) + ... + bqe(t-q) + v(t),

e(t-s) is nothing else than the random noise process at the time period t-s, b0,...bq are the coefficients to be estimated and v(t) the error term from the modelling procedure. q represents the number of lags of the series.

On the same token, the modelling task could be to identify the existing relationship between the variable Yt and its past values. This is called an Autoregressive or AR(p) model of a given order lag p.

Therefore, we can write

Y(t) = a0 + a1Y(t-1) + a2Y(t-2) + ... + apY(t-p) + u(t)

where Y(t-s) is nothing else than the value of Y at the time period t-s, a0,...ap are the coefficients to be estimated and u(t) the error term from the modelling procedure. p is the number of lags considered for the modeling procedure in order to get a good fit. For

Based on these models setting, in order to get a good fit, one should get the error term to be a random white noise. Indeed, White noise time series is of great interest because if the stochastic behavior of all time series could be explained in terms of the white noise model, then classical statistical methods would suffice.

What is an ARMA model ?

So far we have presented some prior concepts about time series. All this material is supported by a solid mathematical background. I will not go in the algebra but I will focus on the practical side. Therefore, based on what we already have, we can now define an ARMA model.

We say that a time series Yt follows an ARMA(p,q) if and only if it can be represented as a combination of an Autoregressive model AR(p) and a Moving Average model of order q, MA(q). Yt can be written as follows

Y(t) = c0 + a1Y(t-1) + a2Y(t-2) + ... + apY(t-p) + b1e(t-1) + b2e(t-2) + ... + bqe(t-q) + e(t)

where c0, a1,...,ap, b1, ... , bq are the coefficients to be estimated and w(t) is a random white noise. p is the number of lags considered for the autoregressive pattern and q is the number of lags for the moving average part.

The ARIMA model : a useful variant of the ARMA model

As the title says, a variant of the ARMA model is what we call the ARIMA. This specification is useful when the time series is nonstationary and displays a stochastic trend. Therefore, we should first differentiate the time series before applying the ARMA model on the differentiated time series. Differentiating a time series means that we substract the first lag value (or more accordingly). If you want to know more about this, just check the following blog post on time series for beginners with R.

For example an ARIMA model has 3 parameters, and is noted ARIMA(p,r,q), where p is the number of lags for the autoregressive part, q the number of lags of the Moving average part and r is the number of time we should differentiate in order to obtain a stationary ARMA model. For more details about the stationarity conditions of an ARMA model, check the book by James Hamilton(1985). Hence, if a time series displays a stochastic trend, we should always consider the ARIMA model instead of the simple ARMA model

How to estimate an ARMA model with R ?

In practice, estimating an ARMA model is nothing else than estimating a multilinear regression model where the related coefficients c0, a1,...,ap, b1, ... , bq, and the response variable is Y(t). Indeed, the common used function with R software is the function called "arma()". Here is the standard code.


  1. X : is the univariate time series or data to use for estimation. This correspond to our variable Yt in our case.

  2. ordrer = c(1,1) : is a vector of 2 elements defining the lag order of the AR(p) and MA(q). In order words, p is the first element of the vector and q the second. To transform ARMA(p,q) into an AR(p) ( an MA(q)) model, we just have to set q=0 (p=0).

  3. lag = c(p,q) : this option can be used as an alternative to the option "order" in order to define le values of p and q. So, it is a list defining the values p and q for the AR(p) and MA(q) respectively.

  4. coef : it a vector of the coefficients of an ARMA obtained from an initial estimation.

  5. include.intercept : A boolean value saying if we should include the intercept in our estimation or not.

  6. series : is the name of the series

  7. qr.tol : is the stopping value of the iteration from the estimation procedure of the standard error of the coefficients.

Usually in R we should use the arima() function instead the arma, but the argument remain almost the same with order = c(p,r,q).

Example of an estimation with the Bitcoin data

In this section I will run a simple estimation of an ARMA(p,q) model using the bitcoin monthly data from January 2015 to November 2020. I downloaded the data from Yahoo finance. Here is the code to follow-up for importing the data frame from your computer and transform it into a monthly time series. Just to precise that for this example I use the Adjusted close price as it is cleanest version of the close price we can use for computations.

After plotting the results of the time series of monthly adjusted close price of BTCUSD, I obtain the following graph. As you can see the BTCUSD displays a deterministic and a stochastic trend. The deterministic trend is directly observed via the red trend line, while the stochastic trend is observed by the variability of the time series changing with time.

Let us assume now that the monthly bitcoin data can be fitted by an ARIMA(2,1,2) model, meaning that it has an autoregressive pattern with 2 lags and a moving average with 2 lags. Let then try to estimate the coefficients of this model specification and see what we obtain. Here is the code.

Therefore the results of the estimations are presented on the following image. We obtain the table of coefficients of the AR(2) part of BTCUSD and MA(2) considered in the model.

The first line of the table represent the estimated values of theirs related coefficients and the second row represents their corresponding asymptotic standard errors. In addition we have the log-likelihood of the estimation and the AIC. These values are nothing else than the criteria used to select the optimal number of the parameters p, q and r in order to get the best fit. We will talk about that later.

As the estimation is done, we can plot the original data and the predicted one all together on the same figure in order to check how well we have done so far. Here is the code to follow-up

Therefore, we obtain the following figure. As you can see, we are doing pretty well with a certain lag. The original series is the solid black line while the predicted one is the red line.

We are not done with this time series. we can now try to forecast the series for one month or more ahead. This is the purpose of the next section.

Forecasting using ARMA model

As we have already estimated the model, let's try to forecast it. For such purpose, we can use the function Predict(). It is worth nothing that the forecasted value of Bitcoin for the next month is evaluated as the conditional expectation of the true value, given its past value. Here is the code for the prediction.

I run first a one-month ahead forecast and then a 5-months ahead forecast. Hence, the results of the code are presented as follows. Also we have the related standard error. According to this model, the BTCUSD has the potential to reach the value of $20301.96 in December 2020, and may reach the value of $20879.90 in April 2021.

Interval Forecasting using ARMA model

In the previous section we have presented how to forecast a single point. But in terms of forecasting, one could be interested in the band in which the price of BTCUSD could variate. That's why the interval forecast are useful. Here is the related code.

And we obtain the following plot of the 5 months ahead forecast. The figure of the forecasts is not that clear because the standard error is big. This induce that the confidence intervals become more large.

How to choose the optimal number of lag for the ARMA models?

One of the crucial part of the estimation procedure is the choice of the parameters p, q and r in the model. This is very important as it contributes to get the best potential forecast of the time series. There are 3 different categories of strategies proposed to choose these parameters:

  1. The graphical approach

  2. The empirical approach with the usage of minimization criteria

  3. Inference approach: with Hypothesis testing

The graphical approach is the one using the plot of ACF and PACF. The following table gives you the decision rules in terms of the geometric configuration of ACF and PACF

The second approach used for choosing q and q are AIC (Akaike Information Criterion), AICc (corrected AIC) and BIC (Bayesian Information Criterion). The idea is to run multiple different regression with different values of p and q. Then, choose (p,q) in such a way that the AIC or BIC is minimized. Usually the BIC is preferred to AIC criterion as it is more robust to the increase of the number of lags.

The inference approach consist in using the hypothesis testing by Box-Jenkins or L-Jung Box. These test are nothing else than Khi-squared test close to ANOVA test in order to validate or not the global significance of the estimated coefficients given a certain number of lags. We start with no lag and add progressively the number lags, then test that number in order to see if we have a global significance. Here is a kind of guideline to test and select the optimal lag.

With R software, you can read directly the AIC and BIC from the model estimation or get it by applying the function AIC() or BIC() on the estimation object. In the example, we called the estimation object AR. So we do AIC(AR) or BIC(AR) in order to get their respective values.

So that's all for this post. There are many other features that we can learn, but we prefer to prepare it for another post. I hope you liked it. If you liked the post, please share it with friends and your community of machine learning and data science.

See you on the next post.

Department of Economics,

 University of California Irvine


  • LinkedIn Social Icône
  • Facebook Social Icône
  • Twitter Icône sociale