Deep Learning Systems for Bitcoin – Part 1

Since December, bitcoins can not only be traded at more or less dubious exchanges, but also as futures at the CME and CBOE. And already several trading systems popped up for bitcoin and other cryptocurrencies. None of them can claim big success, with one exception. There is a strategy that easily surpasses all other bitcoin systems and probably also all known historical trading systems. Its name: Buy and Hold. In the light of the extreme success of that particular bitcoin strategy, do we really need any other trading system for cryptos?

Bitcoin – hodl?

A buy and hold strategy works extremely well when a price bubble grows, and extremely bad when it bursts. And indeed, apparently all finance and economy gurus (well, all but John McAfee) tell you that the cryptocurrency market, and especially bitcoin, is a bubble, even a “scam with no substantial worth”, and will soon experience a crash “worse than the 17th century tulip mania” or the “18th century South Sea Company fraud”.

Bubble or not?

By definition, a bubble is a price largely above the ‘real value’ or ‘fair value’ of an asset, and it bursts when people realize that. So what is the fair value of a bitcoin? Obviously not zero, since blockchain based currencies have (aside from their disadvantages) several advantages over traditional currencies, on the economy level as well as on the private level. Such as:

  • They break the link of money and debt. Cryptocurrencies don’t require the bank credit mechanism for money creation. 
  • They can be used where normal money would be impractical, such as fee transfers between machines or trading in multiplayer games.
  • They allow anonymous money transactions. At least in theory.
  • They replace banks for storing and mattresses for stashing money.

I’m ready to believe that blockchain is the future of money transfer and storage. But that does not mean an ever-rising bitcoin price. Blockchain is just a relatively simple algorithm. Hundreds of cryptocurrencies came out in the last year, and any programmer can add a new one anytime. Few will survive. Countries or big companies might sooner or later issue their own crypto tokens, as Venezuela already is attempting. The release of an official blockchain Dollar, Yuan, or Euro would leave the old bitcoin in thin air. Thus, when investing in bitcoin, we should not hope for a far rosy future, but look for its present ‘real value’. 

Due to its extreme volatility and uncertain future, bitcoin is not yet ready to replace bank tresors. However anonymity can be a substantial motive to own it. When you need a hacker to delete your drunk driving record, pay her in bitcoin. But how big is the online market for illegal hacker jobs, kill contracts, money laundering, drugs, weapons, or pro-Trump facebook advertisements? No one knows, but when we compare it with cash, another form of anonymous payment, we get interesting results.

The current cash in circulation in the US is approximately $1.5 trillion dollars. And the current bitcoin supply, about 17 million bitcoins, represents a total value of about $250 billion. Which means that you can already replace 15% of all US cash with bitcoin! Not to mention all the other cryptos. I fear that this supply already exceeds the demand of anonymous online payment for today and also the next future.

For those reasons, a bitcoin “hodl” system, despite its extreme historical performance, is high risk. The same can be said about all the long-term cryptocurrency trading or portfolio systems that recently came out. We don’t know when and how the bubble will burst – maybe bitcoin will go up to $100,000 before – but we have some reason to suspect that at some point sooner or later the bitcoin price might drop like a stone down to its ‘real value’. Which is unknown, but for practical purposes is probably not in the $15,000 area, but more like $15.

So we need some other method to tackle the cryptocurrency trading problem. The first question: Has the crypto market already developed price curve inefficiencies that can be exploited in a trading system? In (1) we see some tests with basic bitcoin strategies. Our own tests came to the same results. Momentum based strategies work of course, but that’s not surprising due to the trend bias in the historical data. Other conventional model-based strategies don’t work well with cryptos, at least in the current market situation.

Therefore our proposed system shall be a fast trading, trend-agnostic strategy. That means it holds positions only a few minutes, and is not exposed to the bubble risk. I can already tell that short-term mean reversion – even with a more sophisticated system as in (1) – produces no good result with cryptos. So only a few possibilities remain. One of them is exploiting short-term price patterns. This is the strategy that we will develop. And I can already tell that it works. But for this we’ll need a deep machine learning system for detecting the patterns and determining their rules. 

Selecting a machine learning library

The basic structure of such a machine learning system is described here. Due to the low signal-to-noise ratio and to ever-changing market conditions, analyzing price series is one of the most ambitious tasks for machine learning. Compared with other AI algorithms, deep learning systems have the highest success rate. Since we can connect any Zorro based trading script to the data analysis software R, we’ll use a R based deep learning package. There are meanwhile many available. Here’s the choice:

  • Deepnet, a lightweight and straightforward neural net library with a stacked autoencoder and a Boltzmann machine. Produces good results when the feature set is not too complex. The basic train and predict functions for using a deepnet autoencoder in a Zorro strategy:

    neural.train = function(model,XY) 
      XY <- as.matrix(XY)
      X <- XY[,-ncol(XY)]
      Y <- XY[,ncol(XY)]
      Y <- ifelse(Y > 0,1,0)
      Models[[model]] <<- sae.dnn.train(X,Y,
          hidden = c(30), 
          learningrate = 0.5, 
          momentum = 0.5, 
          learningrate_scale = 1.0, 
          output = "sigm", 
          sae_output = "linear", 
          numepochs = 100, 
          batchsize = 100)
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- t(X)
  • H2O, an open-source software package with the ability to run on distributed computer systems. Coded in Java, so the latest version of the JDK is required. Aside from deep autoencoders, many other machine learning algorithms are supported, such as random forests. Features can be preselected, and ensembles can be created. Disadvantage: While batch training is fast, predicting a single sample, as usually needed in a trading strategy, is relatively slow due to the server/client concept. The basic H2O train and predict functions for Zorro:

    # also install the Java JDK
    neural.train = function(model,XY) 
      XY <- as.h2o(XY)
      Models[[model]] <<- h2o.deeplearning(
        hidden = c(30),  seed = 365)
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- as.h2o(
      else X <- as.h2o(X)
      Y <- h2o.predict(Models[[model]],X)
  • Tensorflow in its Keras incarnation, a neural network kit by Google. Supports CPU and GPU and comes with all needed modules for tensor arithmetics, activation and loss functions, covolution kernels, and backpropagation algorithms. So you can build your own neural net structure. Keras offers a simple interface for that. The Keras train and predict functions for Zorro:

    #needs Python 3.6 and Anaconda
    #call install_keras() after installing the package
    neural.train = function(model,XY) 
      X <- data.matrix(XY[,-ncol(XY)])
      Y <- XY[,ncol(XY)]
      Y <- ifelse(Y > 0,1,0)
      Model <- keras_model_sequential() 
      Model %>% 
        layer_dense(units=30,activation='relu',input_shape = c(ncol(X))) %>% 
        layer_dropout(rate = 0.2) %>% 
        layer_dense(units = 1, activation = 'sigmoid')
      Model %>% compile(
        loss = 'binary_crossentropy',
        optimizer = optimizer_rmsprop(),
        metrics = c('accuracy'))
      Model %>% fit(X, Y, 
        epochs = 20, batch_size = 20, 
        validation_split = 0, shuffle = FALSE)
      Models[[model]] <<- Model
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- t(X)
      X <- as.matrix(X)
      Y <- Models[[model]] %>% predict_proba(X)
      return(ifelse(Y > 0.5,1,0))
  • MxNet, Amazon’s answer on Google’s Tensorflow. Offers also tensor arithmetics and neural net building blocks on CPU and GPU, as well as high level network functions similar to Keras (the next Keras version will also support MxNet). Just as with Tensorflow, CUDA is supported, but not (yet) OpenCL, so you’ll need a Nvidia graphics card to enjoy GPU support. In direct comparison (2), MxNet turns out less resource hungry and a bit faster than Tensorflow. The standard train and predict functions:

    # how to install the CPU version:
    #cran <- getOption("repos")
    #cran["dmlc"] <- ""
    #options(repos = cran)
    neural.train = function(model,XY) 
      X <- data.matrix(XY[,-ncol(XY)])
      Y <- XY[,ncol(XY)]
      Y <- ifelse(Y > 0,1,0)
      Models[[model]] <<- mx.mlp(X,Y,
           hidden_node = c(30), 
           out_node = 2, 
           activation = "sigmoid",
           out_activation = "softmax",
           num.round = 20,
           array.batch.size = 20,
           learning.rate = 0.05,
           momentum = 0.9,
           eval.metric = mx.metric.accuracy)
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- t(X)
      X <- data.matrix(X)
      Y <- predict(Models[[model]],X)
      return(ifelse(Y[1,] > Y[2,],0,1))

By replacing the neural.train and neural.predict functions, and other functions for saving and loading models that are not listed here, you can run the same strategy with different deep learning packages and compare. We’re currently using MxNet for most machine learning strategies, and I’ll also use it for the short-term bitcoin trading system presented in the upcoming 2nd part of this article. There is no bitcoin futures data available yet, so tick based price data from several bitcoin exchanges will have to do for the backtest. 

I’ve uploaded the interface scripts for Deepnet, H2O, Tensorflow/Keras, and MxNet to the 2018 script repository, so you can run your own deep learning experiments and compare the packages. Here’s a Zorro script for downloading bitcoin prices from Quandl – EOD only, though, since the exchanges demand dear payment for their tick data.

void main()

Further reading

(1) Nicolas Rabener, Quant Strategies in the Cryptocurrency Space

(2) Julien Simon, Tensorflow vs MxNet

(3) Zachary Lipton et al, MxNet – The Straight Dope
(Good introduction in deep learning with MxNet / Gluon examples)

16 thoughts on “Deep Learning Systems for Bitcoin – Part 1”

  1. Now TensorFlow have experimental feature allow to compile your model to binary or to C++ source code:

    So, you potentially can deploy your model in R, save it to file and later make fast prediction straight from Zorro, if you able to bind TF runtime/C++ with Zorro.

    But for the other hand for trivial models, as in your article, why you not to add simple dense layers functionality to Zorro, since you already made PERCEPTRON? There a lot C++ source code of deep nets implementation available, also don’t forget about OpenBLAS and your prediction engine would be blazing fast.

  2. That’s possible, but it had no substantial speed advantage. Prediction would be about 50% faster, but the bottleneck is training. Since we normally have no large feature set in trading systems, prediction is just a few matrix multiplications, and is often anyway faster than many standard indicators with large lookback periods.

  3. Nice one Johann! Very interested to hear your ideas about trading cryptos, particularly now that we can throw the futures contract into the mix. Its trading volume wasn’t exactly spectacular leading up to the Christmas break, but no doubt there are many watching with a lot of interest.

    As a nice coincidence, I also just launched a blog series about using deep learning in trading systems. I’ll be using Keras, and of course Zorro.

    Thanks for sharing your work.

  4. Sounds promising – I’m looking forward to the rest of your blog series. And don’t work too much on holidays!

  5. From your post, it’s not clear how often you retrain your model and witch time frame you trade. For FX you previously suggest 1H timeframe, and 25 day retrain period, so there no speed bottleneck for any R deep learning framework at all. What about crypto market? Which timeframe you use and how often retrain your model?

  6. The timeframe is one minute, retraining every 2 weeks. All this will be covered in the second part of the article.

  7. Your every post worth a hundred posts all others authors, you always source of trading wisdom for me, thanks for your sharing.
    Waiting for part 2 impatiently!
    But I still don’t figure out why you point to taring time as bottleneck, if you retraining only every 2 weeks?

  8. Because the time consuming part is the testing, not live trading, where retraining happens in the background anyway. But in walk forward tests the system is training many times, maybe thousands of times when you also do preselection or optimization. That’s where you need multiple cores, GPU support, and any processing power that you can get.

  9. Also can’t wait for part 2 of this article 🙂 Trying to design a trading bot myself, so I find this blog very interesting.
    However, I think you should read some more about crypto. For example Bitcoin isn’t very anonymous, unlike Monero for example. Also you underestimate the true value of Bitcoin, based on its supply cap 😉
    Potential problems with bit-euro or bit-yuan would be same as fiat – if you can print/issue unlimited amounts, it’s not a very good store of value.

Leave a Reply

Your email address will not be published. Required fields are marked *