Working Down the Column: The Kdb+ Community | Deep Dive

Home

Administration

Deep Dive

Working Down the Column: The Kdb+ Community

Details: Written by Douglas Eadline; Published: 09 September 2014; Hits: 28652

Article Index

Page 1 of 2

Explore time-series database processing

Analyzing time-series data for the financial markets is one case where speed matters. A faster answer means beating your competition to the market. For those who use kdb+/q, you understand this need and how choosing the right tool can make a huge difference in the speed at which queries get answered. As you may have assumed, there has always been an active community of kdb+ users, but until now there has never been a place to call your own.

That situation has changed thanks to kxcommunity.com. The new Kx Community site has been made available by Kx Systems, Inc. to assist and support kdb+ users. If you land on the main page, you are just a click away from current blogs, kdb+ events, job listings, local MeetUps, a monthly update signup, free code, and a real time twitter feed with kdb+ conversations. There is plenty more and the site has loads to offer for both new and experienced kdb+ users.

If you are not familiar with kdb+ and want to learn more about column-oriented databases and time series analysis, then you may want to read the following two sections; otherwise you can skip down to see what Kx Community has to offer.

Using Columns Instead of Rows

A traditional database consists of tables with data ordered by row (row-oriented). As an example, consider Table One below. Each row usually has two or more columns (or fields) that hold similar data. When data are organized by rows, data look-up can be done quickly by using a key-value to identify the row and then its associated data. For many operations this scheme works well, but it does not work so well for certain types of problems where operations on entire columns of data are important. When data are organized by column (column-oriented), operations on columns are extremely fast because there is no key-value look-up.

SQL databases are usually row-oriented and have some advantages over column-oriented databases. Namely, insertion of new rows anywhere in a table is a simple operation because data is indexed using a key-value. Inserting random rows in a column-oriented database is slow due to the amount of data movement that is required. There are, however, certain types of data that do not require random row inserts and only append new data to the end of the table. Time-series data similar to that shown in Table One below fit this criteria — there is never a case where new data is created in the past! Thus, column-oriented databases are the optimal choice for analyzing time-series data.

Another way to understand the difference between column- and row-oriented databases is to consider how the user operates the data. In a row-oriented RDBMS the user queries the database using SQL, which is based on relational algebra and set theory. Column-oriented databases, on the other hand, are based on vectors of ordered lists. This abstraction allows the easy computation over an entire column.

Table One: Example of time series data for daily stock trades.

The kdb+ Time-series Database

The most popular time-series database is kdb+ and its q query language. kdb+ is used by virtually all financial institutions to analyze time series data (e.g., any type of stock or commodity exchange). Kdb/q's origin began with an obscure academic language called APL. Though powerful, APL was somewhat difficult to use and even required a special non-standard keyboard. Arthur Whitney, working at various financial firms, refined the APL approach and created a more user-friendly variant called kdb+ (k database) for time series analysis. To interact with kdb+, Whitney developed a user-friendly "q" language interface. q is an interpreted vector-based dynamically-typed language built for speed and expressiveness. As mentioned, the use of vector commands eliminated the need for virtually all looping structures (for/while) as part of the standard q program. Whitney also co-founded Kx Systems to further develop and support the kdb+/q technology that is used today.

kdb+ has been refined over the years to produce a powerful and expressive time-series database. Some of its important features are:

SQL-like general purpose programming language, q.
One platform, no data transfer, use of kdb+ for streaming, in-memory, and historical data.
kdb+ is an analytic database engine that integrates seamlessly with WebSockets, Java, C/C++, C#, R, Python, Matlab, and others through APIs.
Columnar design optimized for time-series analysis.
Scales to multi-petabyte systems.
Runs on commodity hardware clusters.
Built-in MapReduce.

Joining the Growing Community

If you are a kdb+ user or are intrigued with the kdb+ approach, there are resources waiting for you. First, there is a free version of kdb+ (32 bit) available for download. The free version has all the functionality of the 64-bit version and can be used for commercial, non-commercial, or educational purposes. Second, the Kx Community site provides a 2-page Getting Started Guide, a developer’s tutorial, a kdb+ community wiki, information on using R with kdb+, and a white paper that explains the "q" query language. In addition, there are links to contributed interfaces for Java and C# as well as a Python (from DEVnet as part of Exxerleron), a kdb+ production system frameworks (TorQ by AquaQ Analytics and Enterprise Components from DEVnet), support for WebSockets, and a How-To for pivoting tables using the q language.

All In One Place

Kx Community has also collected all the important kdb+/q websites in one location. There are links to kdb+ developer blogs, background information, white papers, an FAQ site, reddit topics, and stack overflow questions and discussions. kxcommunity.com is the best place to start searching for kdb+/q information. There is a Google Group (Kdb+ Personal Developers) where the community gathers to talk and ask questions about all versions of Kdb+ -- including the free 32-bit version of kdb+.

Community Blogs

Rounding out the Kx Community page are kdb+ Blogs that cover a range of topics, including: The Nature of Ticker Plant Log Files, Open Source Building Blocks for kdb+, WebSockets, HTML5 and kdb+, and more.

You have no rights to post comments

JComments

Main Menu

Search

Login And Newsletter

Feedburner

Subscribe Now!

Front Page RSS Feed

This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.