Time Series Database Review: RayforceDB
This article reviews RayforceDB, a game changer for ultra fast time series analysis in Python.
RayforceDB is a recently open sourced time series database that offers blazingly fast performance.
It is built with inspiration from kdb+, which is also known for its fast performance and minimal application size.
RayforceDB offers similar benefits, being written in pure C and having a binary size of less than 1MB.
Another benefit of RayforceDB is that it offers Python bindings with minimal overhead and no additional dependencies. Hence, it is easy to include in any application.
RayforceDB can be used not only for storing data on disk but also for working with data tables directly in memory. Therefore, it can be used as a direct substitute to Pandas and Polars, offering significantly faster processing than both.
RayforceDB is also suitable for working with and recording live-streamed data that can subsequently be stored on disk.
How I discovered RayforceDB
I discovered RayforceDB coincidentally through a LinkedIn post.
These news came at a very convenient time, because I had been struggling getting SQL databases to work for an intraday investment data application for almost a year.
Subsequently, we have discovered that some of the workloads we wanted to perform are thousands of times faster using the Python bindings of RayforceDB compared to SQL implementations, despite our significant efforts to optimize the SQL database and associated Python code.
Some workloads where also simply so slow in SQL that it would be practically infeasible to process them using this technology. We stopped them without knowing how long they would take exactly, while RayforceDB finished the computation in less than 30 seconds.
My experience with RayforceDB
I have taken part in the development of a intraday database and analytics technology using RayforceDB and in particular rayforce-py over the last four months.
During this period, we have shared feedback with the RayforceDB developers regarding rayforce-py, helping eliminate minor bugs and providing general user experience to help improve the documentation.
While rayforce-py is still in beta, we have not experienced any significant issues since v0.6.2. However, if any future issues appear, our experience is that the RayforceDB developers fix them quickly when they are reported as a GitHub issue, so please help them continue improve the software if you spot something.
Some tips for a smooth transition from SQL
If you come from a SQL-like database, you will have to adjust some fundamental habits. Although this might take some time for you, you will probably eventually appreciate most of these adjustments and the possibilities that they offer.
One of the first aspects that you will notice is probably the lack of foreign keys. Hence, there is no explicit normalization or “on delete cascade”. While this perhaps introduces some additional work for you to keep the database consistent, RayforceDB has some nice implicit “normalization” functionality. For example, you do not have to maintain a table of string identifiers mapped to integers. RayforceDB will automatically create enums for your strings, so you can store and lookup instruments based on their string identifier without any table joins.
With the enums functionality, RayforceDB proceeds to create symfiles, which cleverly reuse enums columns across days, for example, in the typical case where you track the price of the same instrument over multiple days.
Symfiles affect how you effectively store time series data over time. Here is an example from the RayforceDB documentation:
From the structure above, it is important to notice that RayforceDB stores data on disk as simple vector files. This has many practical advantages when it comes to backups and general moves of data. It does not require any database server, you just copy and move the data as with all other elementary files. Another benefit is that you can easily update or insert data in large parted tables by adjusting just one day.
Following the above structure for your time series data, RayforceDB will automatically detect the timestamp column and create a new virtual date column that can be used for fast queries over time in large parted tables.
Since RayforceDB’s Python bindings have minimal overhead and offer essentially native performance, you can start analyzing the data very quickly. The Python bindings allow you to query the data in a chainable Pythonic way, and they require no additional dependencies, so it is easy to include in any Python project.
Disclaimer
I have no economic interest in the RayforceDB project. This positive review is purely driven by my appreciation of the excellent software engineering required to build such a powerful time series technology. Naturally, I wish for the project to continue and be maintained, but this is a benefit for all of us and not just me.
Conclusion
RayforceDB is a brand new open-source database management system that can be used for managing time series tables that are stored both on disk and in memory. It offers blazingly fast performance. The initial investment of learning about the technology and transition from SQL, Polars or Pandas will probably be well worth your time. Compared to other time series alternatives, RayforceDB lives up to the promise.



