The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Consuming real-time market data ticks

I am writing an application (in C++) that will basically process a humongous amount of real time market data ticks from a vendor.  What is the best way to go about doing this?  I have so far come up with this:

The vendor API calls my application back on a separate thread with a pointer to a API-defined class object -- this is what they call a 'tick'.  I drop this tick into a std:queue (part of my problem is also deciding on the right data structure).

A reader thread which constantly monitors the queue, pulls out this tick and sends it to another dispatcher component that will take care of processing it.  The dispatcher could also run a pool of threads or something like that so that the reader thread doesn't stay blocked waiting for the dispatcher to complete its entire processing.

Does this sound like a reasonable approach?  this is considering the fact that I could end up processing literally a few thousand ticks *per second*!
Dilip Send private email
Friday, January 26, 2007
 
 
To handle high volume, you want to touch each item as few times as possible. I don't know exactly what you do with each item but your proposal strikes me as "too many moving parts": callbacks, queue monitors, dispatchers... The most efficient would be to process the item directly when it's delivered. Unless you're running on a multi-processor machine, using lots of threads is actually going to slow it down.  Just something to think about. To validate the design, run some performance tests on some alternate skeleton implementations.
Mike S Send private email
Friday, January 26, 2007
 
 
You might be able to optimize things if your app only really needs to know the most recent price, which would be the case for many simple quote display apps. A great many of those ticks will likely be at the same price as the preceding one, and you could bypass a lot of processing if you can quickly determine that the new quote is the same as the old one, and the only thing that's changed is volume. On the other hand, maybe your app cares about volume, in which case you'd obviously need to process everything.
Greg Send private email
Friday, January 26, 2007
 
 
You might want to look into using kdb+ and q instead of c++
kdb+ consultant Send private email
Friday, January 26, 2007
 
 
That's a lot of ticks per second. :)  By your desire to use worker threads, I'm guessing processing each tick is I/O-bound (e.g. they get dumped into a DB). Is there concern that the I/O may become saturated if you process each tick data right away?

If you don't need immediate processing, maybe you can batch it - just copy the tick data over to a cache, and have a maintenance thread wake up every n seconds to deal with them en masse.

(Also, one thing to watch out for would be the lifetime of the tick data - can you safely save the pointer you got during the callback, or is its lifetime bounded by the callback function itself?)
R
Friday, January 26, 2007
 
 
Actually, I was just reading about a similar system this afternoon:

http://www.infoq.com/news/Esper--ESP-CEP

It's written in Java, but the principles probably still apply.
BenjiSmith Send private email
Friday, January 26, 2007
 
 
"That's a lot of ticks per second. :)  By your desire to use worker threads, I'm guessing processing each tick is I/O-bound (e.g. they get dumped into a DB). Is there concern that the I/O may become saturated if you process each tick data right away?"

That's just it.  This may sound crazy but I am simply going to maintain the entire tick related information in-memory!  I know it doesn't sound durable not to mention the gigabytes of space it might end up occupying but management seems to think memory is the least of their concern.  Who am I to object? :-)

My main problem is to read the ticks from the socket as and when they appear to keep pace with the market data vendor's broadcaster infrastructure.  If I don't, the vendor will start dropping ticks and I might lose valuable pricing information.

So the idea is, read quickly, dump it into a queue, return from the callback func.  On the other end, a reader thread reads from the queue and dispatches it to some kind of controller module.  The important thing is to have a mechanism where 2 pieces of code have only one purpose for their existence.  One reads the ticks and keeps dumping it into a queue and the other just keeps pulling stuff out of the queue.  Thats all -- nothing more nothing less.

Does that sound reasonable?

"If you don't need immediate processing, maybe you can batch it - just copy the tick data over to a cache, and have a maintenance thread wake up every n seconds to deal with them en masse."

This is probably what I might end up doing.

"(Also, one thing to watch out for would be the lifetime of the tick data - can you safely save the pointer you got during the callback, or is its lifetime bounded by the callback function itself?)"

Great point.  The API gives a way to hold the lifetime of a message pointer until you are done with it with a COM style AddRef()/Release() approach.
Dilip
Saturday, January 27, 2007
 
 
If you can't consume ticks fast enough, then the queue will grow. If the queue gets really big the application will start to be using virtual memory, and so slow even more. When it eventually runs out of virtual memory it will probably crash. So perhaps you want some fixed-maximum-length queue, like a ring buffer.
Christopher Wells Send private email
Saturday, January 27, 2007
 
 
In these sorts of apps it is normal to maintain all the data in memory and NOT to persist it to a DB. It is usually not GB's of memory because you are normally only interested in the most recent quote.
Greg Send private email
Saturday, January 27, 2007
 
 
"So the idea is, read quickly, dump it into a queue, return from the callback func.  On the other end, a reader thread reads from the queue and dispatches it to some kind of controller module.  The important thing is to have a mechanism where 2 pieces of code have only one purpose for their existence.  One reads the ticks and keeps dumping it into a queue and the other just keeps pulling stuff out of the queue."

Ah, I think I see - you're trying to minimize the processing time on the callback, so that the data provider perceives it as very responsive, and doesn't try to throttle the updates. That's cool!

The queue sounds good, and there are several ways to make it fast. If you only have one reader thread and one writer thread, a full locking data structure may be overkill - I'd suggest investigating lockless queues instead. Also, if you can pre-allocate the queue and its member elements to minimize new/deletes, there's another obvious optimization. :)

If you're keeping it all in memory, one thing to watch out for, depending on your production environment, is the amount of memory one can access. For example, 32-bit XP by default limits each process to only the lower 2GB of the virtual address space; it's possible to address more memory, but requires tricks.
R
Sunday, January 28, 2007
 
 
> Also, if you can pre-allocate the queue and its member elements to minimize new/deletes, there's another obvious optimization. :)

New/delete seems to be 300 nanoseconds on my machine, so that optimization may not be necessary for only a few thousand transactions per second. I don't know whether performance degrades over time (due to heap fragmentation).

Similarly, an uncontested lock acquisition can be extremely cheap.
Anon.
Sunday, January 28, 2007
 
 
This might sound obvious, but make sure you're passing pointers around so you're not having to execute copy constructors/etc each time something enters and leaves the queue.

Market rates are increasing at an astounding rate (and many vendors are having a hard time keeping up).  I've been privy to some stats that say (if I'm remembering them right) that 26,000 updates per second was where we were at last year.

It's definitely a fun problem to work on!
Doug
Sunday, January 28, 2007
 
 
Christopher Wells:
"If you can't consume ticks fast enough, then the queue will grow. If the queue gets really big the application will start to be using virtual memory, and so slow even more. When it eventually runs out of virtual memory it will probably crash. So perhaps you want some fixed-maximum-length queue, like a ring buffer."

I thought about this.  In the straight queue approach, I thought of limiting the size of the queue (or the # of elements) to a certain really huge preset limit and essentially have it pre-allocated (haven't investigated if std::queue is suitable for my purpose).  That way if I can't consume fast enough, I will have to stop dropping ticks.  At least it will be the application's decision, rather than the vendor's.

Incidentally wouldn't a ring buffer suffer from similar problems?  There may be instances when the write pointer wraps around so fast that there is a danger of it overlapping the read pointer, right?
Dilip Send private email
Monday, January 29, 2007
 
 
Mr. R!
"Ah, I think I see - you're trying to minimize the processing time on the callback, so that the data provider perceives it as very responsive, and doesn't try to throttle the updates. That's cool!"

That's exactly what I am hoping to accomplish.

"The queue sounds good, and there are several ways to make it fast. If you only have one reader thread and one writer thread, a full locking data structure may be overkill - I'd suggest investigating lockless queues instead. Also, if you can pre-allocate the queue and its member elements to minimize new/deletes, there's another obvious optimization. :)"

On this account I haven't researched std::queue yet.  I have had problems with std::map's insert/update operations in multithreaded environment in the past.  I have to see how std::queue fares in this regard.

"If you're keeping it all in memory, one thing to watch out for, depending on your production environment, is the amount of memory one can access. For example, 32-bit XP by default limits each process to only the lower 2GB of the virtual address space; it's possible to address more memory, but requires tricks."

Yeah -- I know about this limitation.  I **think** the deployment machine might end up being 64-bit.

I think we continue to be on the same page.  Are you my alter-ego or something like that :-)
Dilip Send private email
Monday, January 29, 2007
 
 
Doug:
"Market rates are increasing at an astounding rate (and many vendors are having a hard time keeping up).  I've been privy to some stats that say (if I'm remembering them right) that 26,000 updates per second was where we were at last year."

That is exactly what I am worried about too.  In fact the OPRA market (Options Processing Regulatory Authority) can really congest the network with an insane amount of ticks.
Dilip Send private email
Monday, January 29, 2007
 
 
> I thought of limiting the size of the queue (or the # of elements) to a certain really huge preset limit

Like perhaps a few 100,000 elements to buffer up to a minute's-worth of ticks?

> Incidentally wouldn't a ring buffer suffer from similar problems?  There may be instances when the write pointer wraps around so fast that there is a danger of it overlapping the read pointer, right?

Right: using the ring buffer won't magically enable you to keep up if the processing is too slow elsewhere. Using a  fixed-sized ring buffer will make (has made) you think about how to manage if it's too slow: if the buffer becomes full for example, you might choose to discard either the oldest or the newest ticks (i.e. let the performance degrade gracefully, rather than degrade catastrophically as it might if you start to use virtual memory).
Christopher Wells Send private email
Monday, January 29, 2007
 
 
I've programmed on similar systems, here's my advice from what I've seen work.

1) Put updates on a queue so the processing that must occur for any given tick update event is as quick as possible. This way you can deal with bursts of incoming data without losing anything. You can also have multiple reader threads picking off the queue, which will enable you to make use of quad-core processors etc for extra processing speed.

2) Have an explicit policy for what happens when (not if, it will happen at some point) you can't process ticks fast enough. When do you start dropping? What is dropped? Document this explicitly so people know what to expect and add tests that it works as advertised. This will start going wrong when you already have other problems to deal with and you want to be able to rely on this behaviour at least.

3) Make it easy to run a flood test and see how much can you handle before crapping out? Make it easy to assess limits on your candidate hardware so people don't suddenly bump heads on the ceiling in a production environment.
Arkestra
Monday, January 29, 2007
 
 
Arkestra!
Thanks for your suggestions.  I will keep all of them in mind.

Christopher Wells:
"Like perhaps a few 100,000 elements to buffer up to a minute's-worth of ticks?"

Aren't you assuming that the reader thread cannot keep pace with the writer thread?  If you have 2 independant threads whose job is only to stuff and extract/dispatch from the queue respectively (*without* doing *any* kind of processing), wouldn't the situation be mitigated somewhat?  A simple push operation on one side and pop out on the other.

I get your point though.
Dilip Send private email
Monday, January 29, 2007
 
 
> Aren't you assuming that the reader thread cannot keep pace with the writer thread?

I'm assuming it's a possibility (pesky hardware, and bursty load) that the software needs to cope with gracefully. Or you could just say that that scenario is "not supported".

> If you have 2 independant threads whose job is only to stuff and extract/dispatch from the queue respectively (*without* doing *any* kind of processing), wouldn't the situation be mitigated somewhat?

I wouldn't see the point in extracting something from the queue without processing it! So, so far as I know, the read-from-queue might (at least temporarily, and I don't know for how long) be slower than the write-to-queue.
Christopher Wells Send private email
Monday, January 29, 2007
 
 
"I wouldn't see the point in extracting something from the queue without processing it!"

The idea is to separate the extraction from the processing. 

void queuereader::readqueue()
{
    for(;;)
    {
        if (queue.IsNotEmpty())
        {
            MyTick* tick = queue.Pop();
            AsyncController::DispatchTick(tick);
            // return immediately!
        }
    }
}

DispatchTick can leisurely worry about processing the tick as it isn't blocking queuereader::readqueue anyway.
Dilip Send private email
Monday, January 29, 2007
 
 
So AsyncController::DispatchTick contains *another* queue. You know, having an infinite number of queues, with dedicated threads that do nothing but move ticks from one queue to another, does nothing to solve the problem: if your input is ever persistently faster than your system's processing bandwidth, then you need to:

* Flow-control the input
* Or, discard some input
* Or, have an infinite amount of storage.
Christopher Wells Send private email
Monday, January 29, 2007
 
 
"So AsyncController::DispatchTick contains *another* queue."

I get your point now.  Initially I thought DispatchTick will be the moral equivalent of a thread pool that will grow and shrink based on the # of ticks being delivered.  I just realized that unless the processing is blindingly fast, even a self-adjusting thread pool will simply end up creating a new thread for every arriving tick.  I had the Win32 API QueueUserWorkItem in mind but its not really customizable.
Dilip Send private email
Monday, January 29, 2007
 
 
I've worked on several market data handlers in the past, some of the lessons I have learned include:

1. If you do not need all the data, and they offer a subscription-style interface, use it.
2. Use several concurrently running feed-handlers on different machines.  Partition data between the different handlers so each one has the smallest workload possible.
3.  Try to utilize built-in low-level features over language level features like thread-safe queues.  For example, your OS buffers network traffic at a very low level, depending on the OS you can increase the buffer size a bit in order to help handle large bursts of traffic.
4. Sometimes in order to keep the business side of processes going you need to just let it go.  If you get too far behind start dropping info until you catch up (this of course must be in a well-defined manner).
UsedToWriteThisStuff Send private email
Tuesday, January 30, 2007
 
 
OPRA = Options Price Reporting Authority
Greg Send private email
Tuesday, January 30, 2007
 
 
Yes, and you can tell how long someone has been in the business by how they pronounce it:

- talk-show host == newbie
- italian music == greybeard
BillT Send private email
Tuesday, January 30, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz