A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.
I need to implement a system to persist incoming messages (about 1500/second) for a short duration (they expire in 2 minutes). size of messages may vary from 1kb-150kb. The messages can be removed from when they expire or when client fetches the message based on a message id/correlation id. In my scenario, these messages are sent to my application by several web services. Typically, at any given time there can be about 0-20 incoming messages for each client and there can be few hundred clients trying to fetch messages concurrently. I am wondering what will be best way to implement such a system.
So far I have thought about two possible implementations:
1. put all messages on a JMS queue, when a client reqests for a message, create a message selector to pick the message from queue.
2. put all messages in a database, when a client requests for a message, do a database lookup based on the id.
Which of the above do you think in better? Can you suggest more elegant approach? My primary concern is performance. Scalability is a secondary concern. With JMS I am concerned that creating new message selector based on message ID may cause too much overhead. Also, I am not sure whether there are any JMS techniques other than message selector that can be used in this scenario. With database, I am not sure what will be the performance for so many inserts, selects, and deletes per second. I can use weblogic jms and ms sql database. I look forward to ur suggestions.
(round (/ (* 1500 60 2 150) (* 1024 1024))) -> 26.0
So your peak scenario is 26G of data at a time. That rules out keeping it in memory. The next-best, performance-wise, would be going to the filesystem.
You haven't told how messages are routed from provider to consumer, but I assume, they contain a recipient-id, which identifies the consumer. If that's the case, you could create a folder per recipient-id, and drop the messages as files, in the folder. Use the filesystems locking-capabilities when writing, and let the consumer delete files, when they are read.
I manage a similar system based on JMS queues.
The biggest performance bottlenecks were:
- transaction support in JMS (really slows down the whole system, throughput with transactions is like 1/8th of the throughput without transactions - disable transactions if that's acceptable)
- individual message send/receive operations (if possible, use a batching technique where you collect a number of messages in memory and send them to the queue in one big message, process them in batches and receive the processed batches)
With these restrictions you can achieve about 500-1000 messages/second/node on a "normal" server without any extras. So I can say that with 4-5 nodes your goal is reachable.
On another old system I manage an Oracle database that receives messages, some pl/sql scripts processes them and then a C program selects/deletes the output from a database table.
I could speed up the system by similar technique: do more than one insert in a transaction, that helps a lot. Throughput achievable is a bit lower that with the JMS approach: about 300 messages/second/node on the same hardware.
Hope this helps a bit.
Do you need to persist these messages at all?
How about a CEP/ESP solution like Esper that would allow you to process messages in real time?
Saturday, January 19, 2008
You could use one of caching libraries like EhCache or JBoss Cache. There you can configure expiration times, max counts, disk overflow etc.
JBoss Cache can be even run in transactional context (JTA).
Tuesday, February 05, 2008
If you've got the resources (beefy hardware and deep pockets), would a combination of a jms queue and an in-memory cache (e.g. gigaspaces) with write behind persistence be an option?
Monday, February 11, 2008
This topic is archived. No further replies will be accepted.Other recent topics
Powered by FogBugz