Deep Learning Systems for Bitcoin – Part 1

Since December, bitcoins can not only be traded at more or less dubious exchanges, but also as futures at the CME and CBOE. And already several trading systems popped up for bitcoin and other cryptocurrencies. None of them can claim big success, with one exception. There is a strategy that easily surpasses all other bitcoin systems and probably also all known historical trading systems. Its name: Buy and Hold. In the light of the extreme success of that particular bitcoin strategy, do we really need any other trading system for cryptos?

Bitcoin – hodl?

A buy and hold strategy works extremely well when a price bubble grows, and extremely bad when it bursts. And indeed, apparently all finance and economy gurus (well, all but John McAfee) tell you that the cryptocurrency market, and especially bitcoin, is a bubble, even a “scam with no substantial worth”, and will soon experience a crash “worse than the 17th century tulip mania” or the “18th century South Sea Company fraud”.

Bubble or not?

By definition, a bubble is a price largely above the ‘real value’ or ‘fair value’ of an asset, and it bursts when people realize that. So what is the fair value of a bitcoin? Obviously not zero, since blockchain based currencies have (aside from their disadvantages) several advantages over traditional currencies, on the economy level as well as on the private level. Such as:

  • They break the link of money and debt. Cryptocurrencies don’t require the bank credit mechanism for money creation. 
  • They can be used in areas where normal money would be impractical, f.i. for trading in multiplayer online games.
  • They allow anonymous money transactions. At least in theory.
  • They replace banks for storing and mattresses for stashing money.

I readily believe that blockchain is the future of money. But investing today’s money in a far future can be a recipe for failure. Hundreds of cryptocurrencies came out in the last year, and any good programmer can add a new one anytime. Few will survive. Countries or big companies might sooner or later issue their own cryptos, as Venezuela already is attempting. All this has the potential to squash the old bitcoin. So we cannot hope for the future – we must look for its present true value. 

Anonymity can be a substantial motive to own bitcoin. Want a hacker to delete your drunk driving record? Pay her in bitcoin. But how big is the online market for illegal hacker jobs, kill contracts, money laundering, drugs, weapons, or pro-Trump facebook advertisements? No one knows, but when we compare it with cash, another form of anonymous payment, we get interesting results.

The current cash in circulation in the US is approximately $1.5 trillion dollars. And the current bitcoin supply, about 17 million bitcoins, represents a total value of about $250 billion. Which means that you can already replace 15% of all US cash with bitcoin. Not to mention all the other cryptos. I fear that this supply already exceeds the demand of anonymous online payment for today and also the next future.

For those reasons, a bitcoin “hodl” system, despite its extreme historical performance, is high risk in today’s market. The same can be said about all long-term trading or portfolio rebalancing cryptocurrency systems. We don’t know when the bubble will burst – maybe bitcoin will go up to $100,000 before – but we have some reason to suspect that at some point sooner or later the bitcoin price might drop like a stone to its true value. Which is unknown, but for practical purposes is probably not in the $15,000 area, but more like $15.

So we need some other method to tackle the cryptocurrency trading problem. In (1) we can see some tests with bitcoin strategies. Our own tests came to the same results. Momentum based strategies work of course, but that’s not surprising due to the trend bias in the historical data. Other conventional model-based strategies don’t work well with cryptos, at least in the current market situation. Our proposed system should be a fast trading, trend-agnostic strategy. That means it holds positions only a few minutes, and is thus not exposed to the bubble risk. I can already tell that short-term mean reversion – even with a more sophisticated system as in (1) – produces no good result with cryptos. So only a few possibilities remain. One of them is exploiting short-term price patterns. This is the strategy that we will develop. For this we’ll need a deep machine learning system for detecting the patterns and determining their rules. 

Selecting a machine learning library

The basic structure of such a machine learning system is described here. Due to the low signal-to-noise ratio and to ever-changing market conditions, analyzing price series is one of the most ambitious tasks for machine learning. Compared with other AI algorithms, deep learning systems have the highest success rate. Since we can connect any Zorro based trading script to the data analysis software R, we’ll use a R based deep learning package. There are meanwhile many available. Here’s the choice:

  • Deepnet, a lightweight and straightforward neural net library with a stacked autoencoder and a Boltzmann machine. Produces good results when the feature set is not too complex. The basic train and predict functions for using a deepnet autoencoder in a Zorro strategy:

    neural.train = function(model,XY) 
      XY <- as.matrix(XY)
      X <- XY[,-ncol(XY)]
      Y <- XY[,ncol(XY)]
      Y <- ifelse(Y > 0,1,0)
      Models[[model]] <<- sae.dnn.train(X,Y,
          hidden = c(30), 
          learningrate = 0.5, 
          momentum = 0.5, 
          learningrate_scale = 1.0, 
          output = "sigm", 
          sae_output = "linear", 
          numepochs = 100, 
          batchsize = 100)
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- t(X)
  • H2O, an open-source software package with the ability to run on distributed computer systems. Coded in Java, so the latest version of the JDK is required. Aside from deep autoencoders, many other machine learning algorithms are supported, such as random forests. Features can be preselected, and ensembles can be created. Disadvantage: While batch training is fast, predicting a single sample, as usually needed in a trading strategy, is relatively slow due to the server/client concept. The basic H2O train and predict functions for Zorro:

    # also install the Java JDK
    neural.train = function(model,XY) 
      XY <- as.h2o(XY)
      Models[[model]] <<- h2o.deeplearning(
        hidden = c(30),  seed = 365)
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- as.h2o(
      else X <- as.h2o(X)
      Y <- h2o.predict(Models[[model]],X)
  • Tensorflow in its Keras incarnation, a neural network kit by Google. Supports CPU and GPU and comes with all needed modules for tensor arithmetics, activation and loss functions, covolution kernels, and backpropagation algorithms. So you can build your own neural net structure. Keras offers a simple interface for that. The Keras train and predict functions for Zorro:

    #call install_keras() after installing the package
    neural.train = function(model,XY) 
      X <- data.matrix(XY[,-ncol(XY)])
      Y <- XY[,ncol(XY)]
      Y <- ifelse(Y > 0,1,0)
      Model <- keras_model_sequential() 
      Model %>% 
        layer_dense(units=30,activation='relu',input_shape = c(ncol(X))) %>% 
        layer_dropout(rate = 0.2) %>% 
        layer_dense(units = 1, activation = 'sigmoid')
      Model %>% compile(
        loss = 'binary_crossentropy',
        optimizer = optimizer_rmsprop(),
        metrics = c('accuracy'))
      Model %>% fit(X, Y, 
        epochs = 20, batch_size = 20, 
        validation_split = 0, shuffle = FALSE)
      Models[[model]] <<- Model
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- t(X)
      X <- as.matrix(X)
      Y <- Models[[model]] %>% predict_proba(X)
      return(ifelse(Y > 0.5,1,0))
  • MxNet, Amazon’s answer on Google’s Tensorflow. Offers also tensor arithmetics and neural net building blocks on CPU and GPU, as well as high level network functions similar to Keras (the next Keras version will also support MxNet). In direct comparison to Tensorflow (2), MxNet turns out less resource hungry and a bit faster. The standard train and predict functions:

    # how to install the CPU version:
    #cran <- getOption("repos")
    #cran["dmlc"] <- ""
    #options(repos = cran)
    neural.train = function(model,XY) 
      X <- data.matrix(XY[,-ncol(XY)])
      Y <- XY[,ncol(XY)]
      Y <- ifelse(Y > 0,1,0)
      Models[[model]] <<- mx.mlp(X,Y,
           hidden_node = c(30), 
           out_node = 2, 
           activation = "sigmoid",
           out_activation = "softmax",
           num.round = 20,
           array.batch.size = 20,
           learning.rate = 0.05,
           momentum = 0.9,
           eval.metric = mx.metric.accuracy)
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- t(X)
      X <- data.matrix(X)
      Y <- predict(Models[[model]],X)
      return(ifelse(Y[1,] > Y[2,],0,1))

By replacing the neural.train and neural.predict functions, and other functions for saving and loading models that are not listed here, you can run the same strategy with different deep learning packages and compare. We’re currently using MxNet for most machine learning strategies, and I’ll also use it for the short-term bitcoin trading system presented in the upcoming 2nd part of this article. There is no bitcoin futures data available yet, so tick based price data from several bitcoin exchanges will have to do for the backtest. 

I’ve uploaded the interface scripts for Deepnet, H2O, Tensorflow/Keras, and MxNet to the 2018 script repository, so you can run your own deep learning experiments and compare the packages. Here’s a Zorro script for downloading bitcoin prices from Quandl – EOD only, though, since the exchanges demand dear payment for their tick data.

void main()

Further reading

(1) Nicolas Rabener, Quant Strategies in the Cryptocurrency Space

(2) Julien Simon, Tensorflow vs MxNet

(3) Zachary Lipton et al, MxNet – The Straight Dope
(Good introduction in deep learning with MxNet / Gluon examples)

9 thoughts on “Deep Learning Systems for Bitcoin – Part 1”

  1. Now TensorFlow have experimental feature allow to compile your model to binary or to C++ source code:

    So, you potentially can deploy your model in R, save it to file and later make fast prediction straight from Zorro, if you able to bind TF runtime/C++ with Zorro.

    But for the other hand for trivial models, as in your article, why you not to add simple dense layers functionality to Zorro, since you already made PERCEPTRON? There a lot C++ source code of deep nets implementation available, also don’t forget about OpenBLAS and your prediction engine would be blazing fast.

  2. That’s possible, but it had no substantial speed advantage. Prediction would be about 50% faster, but the bottleneck is training. Since we normally have no large feature set in trading systems, prediction is just a few matrix multiplications, and is often anyway faster than many standard indicators with large lookback periods.

  3. Nice one Johann! Very interested to hear your ideas about trading cryptos, particularly now that we can throw the futures contract into the mix. Its trading volume wasn’t exactly spectacular leading up to the Christmas break, but no doubt there are many watching with a lot of interest.

    As a nice coincidence, I also just launched a blog series about using deep learning in trading systems. I’ll be using Keras, and of course Zorro.

    Thanks for sharing your work.

  4. Sounds promising – I’m looking forward to the rest of your blog series. And don’t work too much on holidays!

Leave a Reply

Your email address will not be published. Required fields are marked *