Explore time-series database processing
Analyzing time-series data for the financial markets is one case where speed matters. A faster answer means beating your competition to the market. For those who use kdb+/q, you understand this need and how choosing the right tool can make a huge difference in the speed at which queries get answered. As you may have assumed, there has always been an active community of kdb+ users, but until now there has never been a place to call your own.
That situation has changed thanks to kxcommunity.com. The new Kx Community site has been made available by Kx Systems, Inc. to assist and support kdb+ users. If you land on the main page, you are just a click away from current blogs, kdb+ events, job listings, local MeetUps, a monthly update signup, free code, and a real time twitter feed with kdb+ conversations. There is plenty more and the site has loads to offer for both new and experienced kdb+ users.
If you are not familiar with kdb+ and want to learn more about column-oriented databases and time series analysis, then you may want to read the following two sections; otherwise you can skip down to see what Kx Community has to offer.
Using Columns Instead of Rows
A traditional database consists of tables with data ordered by row (row-oriented). As an example, consider Table One below. Each row usually has two or more columns (or fields) that hold similar data. When data are organized by rows, data look-up can be done quickly by using a key-value to identify the row and then its associated data. For many operations this scheme works well, but it does not work so well for certain types of problems where operations on entire columns of data are important. When data are organized by column (column-oriented), operations on columns are extremely fast because there is no key-value look-up.SQL databases are usually row-oriented and have some advantages over column-oriented databases. Namely, insertion of new rows anywhere in a table is a simple operation because data is indexed using a key-value. Inserting random rows in a column-oriented database is slow due to the amount of data movement that is required. There are, however, certain types of data that do not require random row inserts and only append new data to the end of the table. Time-series data similar to that shown in Table One below fit this criteria — there is never a case where new data is created in the past! Thus, column-oriented databases are the optimal choice for analyzing time-series data.
Another way to understand the difference between column- and row-oriented databases is to consider how the user operates the data. In a row-oriented RDBMS the user queries the database using SQL, which is based on relational algebra and set theory. Column-oriented databases, on the other hand, are based on vectors of ordered lists. This abstraction allows the easy computation over an entire column.
Table One: Example of time series data for daily stock trades.
The kdb+ Time-series Database
The most popular time-series database is kdb+ and its q query language. kdb+ is used by virtually all financial institutions to analyze time series data (e.g., any type of stock or commodity exchange). Kdb/q's origin began with an obscure academic language called APL. Though powerful, APL was somewhat difficult to use and even required a special non-standard keyboard. Arthur Whitney, working at various financial firms, refined the APL approach and created a more user-friendly variant called kdb+ (k database) for time series analysis. To interact with kdb+, Whitney developed a user-friendly "q" language interface. q is an interpreted vector-based dynamically-typed language built for speed and expressiveness. As mentioned, the use of vector commands eliminated the need for virtually all looping structures (for/while) as part of the standard q program. Whitney also co-founded Kx Systems to further develop and support the kdb+/q technology that is used today.kdb+ has been refined over the years to produce a powerful and expressive time-series database. Some of its important features are:
- SQL-like general purpose programming language, q.
- One platform, no data transfer, use of kdb+ for streaming, in-memory, and historical data.
- kdb+ is an analytic database engine that integrates seamlessly with WebSockets, Java, C/C++, C#, R, Python, Matlab, and others through APIs.
- Columnar design optimized for time-series analysis.
- Scales to multi-petabyte systems.
- Runs on commodity hardware clusters.
- Built-in MapReduce.