Time series machine learning python

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Machine learning models for time series analysis

License

maxim5/time-series-machine-learning

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Time Series Prediction with Machine Learning

A collection of different Machine Learning models predicting the time series, concretely the market price for given the currency chart and target.

BTC_ETH chart BTC_LTC chart

Required dependency: numpy . Other dependencies are optional, but to diversify the final models ensemble, it’s recommended to install these packages: tensorflow , xgboost .

Tested with python versions: 2.7.14, 3.6.0.

There is one built-in data provider, which fetches the data from Poloniex exchange. Currently, all models have been tested with crypto-currencies’ charts.

Fetched data format is standard security OHLC trading info: date, high, low, open, close, volume, quoteVolume, weightedAverage. But the models are agnostic of the particular time series features and can be trained with sub- or superset of these features.

To fetch the data, run run_fetch.py script from the root directory:

# Fetches the default tickers: BTC_ETH, BTC_LTC, BTC_XRP, BTC_ZEC for all time periods. $ ./run_fetch.py

By default, the data is fetched for all time periods available in Poloniex (day, 4h, 2h, 30m, 15m, 5m) and is stored in _data directory. One can specify the tickers and periods via command-line arguments.

# Fetches just BTC_ETH ticker data for only 3 time periods. $ ./run_fetch.py BTC_ETH --period=2h,4h,day

Note: the second and following runs won’t fetch all charts from scratch, but just the update from the last run till now.

To start training, run run_train.py script from the root directory:

# Trains all models until stopped. # The defaults: # - tickers: BTC_ETH, BTC_LTC, BTC_XRP, BTC_ZEC # - period: day # - target: high $ ./run_train.py # Trains the models for specified parameters. $ ./run_train.py --period=4h --target=low BTC_BCH

By default, the script trains all available methods (see below) with random hyper-parameters, cross-validates each model and saves the result weights if the performance is better than current average (the limit can be configured).

All models are placed to the _zoo directory (note: it is possible that early saved models will perform much worse than later ones, so you’re welcome to clean-up the models you’re definitely not interested in, because they can only spoil the final ensemble).

Note 1: specifying multiple periods and targets will force the script to train all combinations of those. Currently, the models do not reuse weights for different targets. In other words, if set —target=low,high , it will train different models particularly for low and for high .

Note 2: under the hood, the models work with transformed data, in particular high , low , open , close , volume are transform to percent changes. Hence, the prediction for these columns is also percent changes.

Currently supported methods:

  • Ordinary linear model. Even though it’s very simple, as it turns out, the linear regression shows pretty good results and compliments the more complex models in the final ensemble.
  • Gradient boosting (using xgboost implementation).
  • Deep neural network (in tensorflow ).
  • Recurrent neural network: LSTM, GRU, one or multi-layered (in tensorflow as well).
  • Convolutional neural network for 1-dimensional data (in tensorflow as well).

All models take as input a window of certain size (named k ) and predict a single target value for the next time step. Example: window size k=10 means that the model accepts (x[t-10], x[t-9], . x[t-1]) array to predict x[t].target . Each of x[i] includes a number of features (open, close, volume, etc). Thus, the model takes 10 * features values in and outputs a single value — percent change for the target column.

Saved models consist of the following files:

Оцените статью