Sunday, March 30, 2008

Storing EOD Market Data

I've been working on extending my own market scanner, and I've revamped the way I store massive amounts of "end of day" historical stock data.

Previously I was using an organized hierarchy of compressed binary files containing a symbol's quotes. Since I'm starting to work solely with Python/SciPy/NumPy, I decided to look at PyTables. PyTables is a hierarchical database package to efficiently manage large amounts of data using HDF5. HDF5 is basically just a nice file format for storing a lot of scientific data.

I'm currently maintaining different PyTables files for the NASDAQ, NYSE, AMEX, and CME. I'm grouping stocks by sector, and creating an individual PyTables table in each group for each stock. The setup has been working immensely well, and I would highly recommend it.

2 comments:

C. Bess said...
This comment has been removed by the author.
C. Bess said...

PyTables seems great, almost like Zope DB. But with PyTables seems to provides lower level querying abilities, which is good and bad, mostly bad (in comparison to Zope DB catalog querying).

In addition to EOD data, I store intra-day data. I use Postgres and C#. I am moving to Postgres with SQLAlchemy once I upgrade my server.

SQLAlchemy will give you similar OOP interfaces to the database and Postgres is a little more universal. In that you can read from it using another language or tool (if you need to later down the road).