Open in Colab: https://colab.research.google.com/github/ryanraba/stocksml/blob/master/docs/data.ipynb


Market Data

StocksML uses stock market price data as the basis for training models to learn market trading strategies. A small set of demonstration data is included in the StocksML package, but generally users will need to download or otherwise supply their own price data.

Download from IEX Cloud

The FetchData function in StocksML can be used to download data from IEX Cloud. An account is needed (free or paid tier) on IEX to retrieve an API token from the console screen. Copy the token and paste it in to the apikey parameter. A list of desired ticker symbols and a start/end date range should be supplied. These will be stored as CSV files in the specified location.

Note that this will count towards your monthly quota on IEX.

Here we download a small sample of Google and Exxon price data.

[11]:
!pip install stocksml >/dev/null
!mkdir data >/dev/null
from stocksml import FetchData

FetchData(['GOOG', 'XOM'], apikey='xxxxxxxxxxxxxxxx', start='2020-08-01', stop='2020-12-31', path='./data')
fetching GOOG data... 106 days
fetching XOM data... 106 days

Each ticker symbol is stored in a separate CSV file containing daily high, low, open, close and volume columns with a date column in yyyy-mm-dd format.

[16]:
!ls data/
GOOG.csv  XOM.csv
[17]:
!head data/GOOG.csv
date,open,high,low,close,volume
2020-08-03,1486.64,1490.47,1465.64,1474.45,2331514
2020-08-04,1476.57,1485.56,1458.65,1464.97,1903489
2020-08-05,1469.3,1482.41,1463.46,1473.61,1979957
2020-08-06,1471.75,1502.39,1466.0,1500.1,1995368
2020-08-07,1500.0,1516.845,1481.64,1494.49,1577826
2020-08-10,1487.18,1504.075,1473.08,1496.1,1289530
2020-08-11,1492.44,1510.0,1478.0,1480.32,1454365
2020-08-12,1485.58,1512.3859,1485.25,1506.62,1437655
2020-08-13,1510.34,1537.25,1508.005,1518.45,1455208

Data from any other source may be used instead of IEX cloud if it can be represented in this same format.

Load Symbol DataFrame

Appropriately named and formatted CSV files can be loaded in to a single Symbol DataFrame (sdf) using LoadData. The sdf provides a convenient single location for all market data needed later on for model training and trading strategy simulation.

All files in the specified directory can be loaded by leaving the symbols parameter as None.

[18]:
from stocksml import LoadData

sdf, symbols = LoadData(symbols=None, path='./data')

sdf.head()
[18]:
xom_open xom_high xom_low xom_close xom_volume goog_open goog_high goog_low goog_close goog_volume
date
2020-08-03 42.05 42.50 41.47 42.25 23040541 1486.64 1490.470 1465.64 1474.45 2331514
2020-08-04 42.34 43.60 42.24 43.47 17724024 1476.57 1485.560 1458.65 1464.97 1903489
2020-08-05 44.15 44.31 43.53 43.85 17445784 1469.30 1482.410 1463.46 1473.61 1979957
2020-08-06 43.40 43.90 43.25 43.64 14434935 1471.75 1502.390 1466.00 1500.10 1995368
2020-08-07 43.23 43.52 42.81 43.44 18757929 1500.00 1516.845 1481.64 1494.49 1577826

Build Feature DataFrame

The raw price data is not used directly by the models to learn a market strategy. Instead a set of training features must first be created to represent the data in a way that is more conducive to model learning. These are held in a feature dataframe (fdf).

These features are currently fixed within the BuildData function and are a work in progress, likely to be expanded in the future. They may potentially be made user configurable at a later date.

For now, all that is required to build an fdf is to pass the sdf to BuildData.

[19]:
fdf = BuildData(sdf)

fdf.head()
building GOOG data...
building XOM data...
[19]:
goog0 goog1 goog2 goog3 goog4 xom0 xom1 xom2 xom3 xom4
date
2020-08-03 -0.014814 -0.017526 -0.010784 -0.015605 -0.300029 -0.000670 -0.000945 -0.001423 -0.000581 0.159583
2020-08-04 -0.043365 -0.066009 -0.054701 -0.071722 -0.266908 0.150895 0.120418 0.042882 0.161771 0.510642
2020-08-05 -0.033192 0.015996 -0.042706 0.035871 0.097373 0.094690 0.198671 0.273211 0.048568 -0.159542
2020-08-06 0.101999 0.000117 0.000026 0.141292 0.402525 -0.054855 -0.042989 -0.110557 -0.027508 0.256103
2020-08-07 0.068573 0.090926 0.113664 -0.048245 -0.115027 -0.051360 -0.067442 -0.026588 -0.026349 0.215602

Now we are ready to build a model that can learn a market strategy from this data.