Open in Colab: https://colab.research.google.com/github/ryanraba/stocksml/blob/master/docs/data.ipynb
Market Data¶
StocksML uses stock market price data as the basis for training models to learn market trading strategies. A small set of demonstration data is included in the StocksML package, but generally users will need to download or otherwise supply their own price data.
Download from IEX Cloud¶
The FetchData
function in StocksML can be used to download data from IEX Cloud. An account is needed (free or paid tier) on IEX to retrieve an API token from the console screen. Copy the token and paste it in to the apikey
parameter. A list of desired ticker symbols and a start/end date range should be supplied. These will be stored as CSV files in the specified location.
Note that this will count towards your monthly quota on IEX.
Here we download a small sample of Google and Exxon price data.
[11]:
!pip install stocksml >/dev/null
!mkdir data >/dev/null
from stocksml import FetchData
FetchData(['GOOG', 'XOM'], apikey='xxxxxxxxxxxxxxxx', start='2020-08-01', stop='2020-12-31', path='./data')
fetching GOOG data... 106 days
fetching XOM data... 106 days
Each ticker symbol is stored in a separate CSV file containing daily high, low, open, close and volume columns with a date column in yyyy-mm-dd format.
[16]:
!ls data/
GOOG.csv XOM.csv
[17]:
!head data/GOOG.csv
date,open,high,low,close,volume
2020-08-03,1486.64,1490.47,1465.64,1474.45,2331514
2020-08-04,1476.57,1485.56,1458.65,1464.97,1903489
2020-08-05,1469.3,1482.41,1463.46,1473.61,1979957
2020-08-06,1471.75,1502.39,1466.0,1500.1,1995368
2020-08-07,1500.0,1516.845,1481.64,1494.49,1577826
2020-08-10,1487.18,1504.075,1473.08,1496.1,1289530
2020-08-11,1492.44,1510.0,1478.0,1480.32,1454365
2020-08-12,1485.58,1512.3859,1485.25,1506.62,1437655
2020-08-13,1510.34,1537.25,1508.005,1518.45,1455208
Data from any other source may be used instead of IEX cloud if it can be represented in this same format.
Load Symbol DataFrame¶
Appropriately named and formatted CSV files can be loaded in to a single Symbol DataFrame (sdf) using LoadData
. The sdf provides a convenient single location for all market data needed later on for model training and trading strategy simulation.
All files in the specified directory can be loaded by leaving the symbols
parameter as None.
[18]:
from stocksml import LoadData
sdf, symbols = LoadData(symbols=None, path='./data')
sdf.head()
[18]:
xom_open | xom_high | xom_low | xom_close | xom_volume | goog_open | goog_high | goog_low | goog_close | goog_volume | |
---|---|---|---|---|---|---|---|---|---|---|
date | ||||||||||
2020-08-03 | 42.05 | 42.50 | 41.47 | 42.25 | 23040541 | 1486.64 | 1490.470 | 1465.64 | 1474.45 | 2331514 |
2020-08-04 | 42.34 | 43.60 | 42.24 | 43.47 | 17724024 | 1476.57 | 1485.560 | 1458.65 | 1464.97 | 1903489 |
2020-08-05 | 44.15 | 44.31 | 43.53 | 43.85 | 17445784 | 1469.30 | 1482.410 | 1463.46 | 1473.61 | 1979957 |
2020-08-06 | 43.40 | 43.90 | 43.25 | 43.64 | 14434935 | 1471.75 | 1502.390 | 1466.00 | 1500.10 | 1995368 |
2020-08-07 | 43.23 | 43.52 | 42.81 | 43.44 | 18757929 | 1500.00 | 1516.845 | 1481.64 | 1494.49 | 1577826 |
Build Feature DataFrame¶
The raw price data is not used directly by the models to learn a market strategy. Instead a set of training features must first be created to represent the data in a way that is more conducive to model learning. These are held in a feature dataframe (fdf).
These features are currently fixed within the BuildData
function and are a work in progress, likely to be expanded in the future. They may potentially be made user configurable at a later date.
For now, all that is required to build an fdf is to pass the sdf to BuildData
.
[19]:
fdf = BuildData(sdf)
fdf.head()
building GOOG data...
building XOM data...
[19]:
goog0 | goog1 | goog2 | goog3 | goog4 | xom0 | xom1 | xom2 | xom3 | xom4 | |
---|---|---|---|---|---|---|---|---|---|---|
date | ||||||||||
2020-08-03 | -0.014814 | -0.017526 | -0.010784 | -0.015605 | -0.300029 | -0.000670 | -0.000945 | -0.001423 | -0.000581 | 0.159583 |
2020-08-04 | -0.043365 | -0.066009 | -0.054701 | -0.071722 | -0.266908 | 0.150895 | 0.120418 | 0.042882 | 0.161771 | 0.510642 |
2020-08-05 | -0.033192 | 0.015996 | -0.042706 | 0.035871 | 0.097373 | 0.094690 | 0.198671 | 0.273211 | 0.048568 | -0.159542 |
2020-08-06 | 0.101999 | 0.000117 | 0.000026 | 0.141292 | 0.402525 | -0.054855 | -0.042989 | -0.110557 | -0.027508 | 0.256103 |
2020-08-07 | 0.068573 | 0.090926 | 0.113664 | -0.048245 | -0.115027 | -0.051360 | -0.067442 | -0.026588 | -0.026349 | 0.215602 |
Now we are ready to build a model that can learn a market strategy from this data.