The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Scalability Of J2EE Without EJB

I'm thinking of creating a J2EE Web application using Spring and Hibernate running on Tomcat, with MySQL as the database server. Spring and Hibernate seem to be at the friendlier end of the J2EE technology spectrum.

My question is, how scalable is this set up? Could it (for example) scale to support 100,000 users given appropriate hardware and application design? Or do those sorts of numbers mandate distributed processing using EJBs?

I really don't have a feel for this sort of thing, so would appreciate any advice.
Thursday, June 16, 2005
There's a recent article on The Server Side ( that highlights's transition to Java 1.5.    They host a very large scale site on Apache + Tomcat.

Thursday, June 16, 2005
No problems - ensure that the application is stateless, and use the database to store session data rather than in memory on the server. You can then use a hardware load balancer and as many application server machines as necessary to handle the load.

The bottleneck would be the database.
Rhys Keepence Send private email
Thursday, June 16, 2005
Scalability for J2EE is similar to scalability for any other type of setup.

If you can get a more stateless setup, where users aren't pinned to a specific server, then you scale easily by adding more servers behind a load balancer.

Next, your database is the next thing to become the bottleneck, and you can look into caching various things.  You can cache at the page level, subpage level, specific query level, etc.  Even better, set things up so you don't need a database.

The strategies for all of the above are independent of the use of EJB, they're true for any technology you'll use.

I'd stay away from EJB if you're going to use Spring for your transaction management and hibernate for persistence, you'll get 80- 90% of the benefit of EJB, and the ability to stay on a Tomcat server (instead of JBoss, Weblogic, etc.).

As a long time J2EE developer, I wouldn't say that EJB's actually help scalability much beyond standard techniques.   

Hibernate has good caching options available to help with reducing load on your database, similar to what an EJB container might do with Entity beans, except for the clustering support. 

Session beans are quite useful for transaction mgmt, but if you're dealing with that via Spring, it's a wash.  So really, unless you plan to have non-Web clients (Swing, etc.), there's not much to be gained.

Good luck.
Dave C
Thursday, June 16, 2005
"ensure that the application is stateless, and use the database to store session data rather than in memory on the server."

Thanks. I'm not quite sure of the design strategies to adopt in order to make the application stateless.

Let's say that I have a sequence of pages that comprise a registration process (for example) and that some pages need to display information that's been entered on previous pages. Does this mean that I have to store the information in the database at the end of each page and retrieve it again for the next page, rather than just keeping it in the HTTP session and committing it to the database at the end?
Thursday, June 16, 2005
You can keep it in the session. BUT. The session needs to be stored in the database. So essentially, it's both! Most common frameworks by default will store session information in memory/disk files on the server. That works fine, but obviously you now need to always talk to the same server.

Most languages will also allow you to override session storage though. PHP it's an option somwhere, Ruby you can derive a new session class and override the storage methods, etc. etc. I'm sure it can be done in pretty much any language, and definitely it can in Java. The process may vary of course.

So there you go - override the way that session data is stored, and it should be transparent.

Caveat: Sessions are used a lot. If you write your won session storage code, for example in PHP, this is one place where you really do want to spend your time optimising. Slow session read/writes == very slow app.
Andrew Cherry Send private email
Thursday, June 16, 2005
You should be fine with sessions if you have appropriate session failover handled.  Some systems have a specialised session backup server that receives session updates and returns session info to servers seeing a new session id.  Other systems have a session failover server defined for each server, and users are routed to that backup server if the original falls over.  A combination of the two methods avoids a lot of database traffic and meets the needs of most sites.

Large sites with volumes at or above what the OP mentioned use sessions, and with a combination of session backup and intelligent load balancing, have no problem handling millions of sessions on a daily basis.  App servers fall over, users are either redirected to a specified failover server or to a completely new one, and the user is none the wiser.
Art Send private email
Thursday, June 16, 2005
Just a quick question for the original poster:

When you refer to 100,000 users, what are you actually referring to?

* A system with 100,000 registered users?

* A system that serves 100,000 page-views during some time interval (24 hours)?

* A system that can handle 100,000 concurrent users?

Those are three very different scenarios.

I've personally hosted a website that generates 30,000 page views per day (with peak usage of about 100 page views per minute), and I built it with PHP on a rinky-dink shared server for $15 per month.

A site with 100,000 concurrent users would far exceed anything I've ever worked on before, and I'd definitely seek advice from a more seasoned enterprise developer than myself.
BenjiSmith Send private email
Thursday, June 16, 2005
Excellent question, Benji.

Truth be told, I'm really pondering the fantasy scenario whereby my little micro-ISV developed app becomes phenomenally successful and gets bought out by a big player (think Paul Graham) and so ends up being accessed by a large number of concurrent users. I guess 100,000 concurrent users was a little on the high side, but the point is that I don't want to start out with an application architecture that's inherently unscalable and has to be redone if my user base grows significantly.
Thursday, June 16, 2005
BTW, when I said big player and mentioned Paul Graham, I was referring to his company ViaWeb getting bought by Yahoo, not to Paul Graham being a big player. Although if he wanted to toss a few $million my way I wouldn't complain...
Thursday, June 16, 2005
Try to express your load in terms of hits per second.  If you have 100k concurrent users, they're each probably hitting a webserver every 20-30 seconds; lots more (but each one is smaller) if you're using an ajax type ui.  Assume they're well distributed, and you're looking at around 5000 hits per second.  If you have a single shared database, and each hit has to read its session state from the database, and some hits have to write changed session state to the database, you're looking at 10000 database queries per second, just for session state.  You don't want that.  Similarly, look at #queries per second, #filesystem accesses per second, etc., and find a way to throw hardware at each of these.

If you can segregate your users into different hardware farms, that helps - it's easier to serve 10k concurrent users, and then repeat that 10 times, than to serve 100k concurrent users.  Plus, you get off the ground with a much smaller deployment, and can add to it as you grow.  I've seen JSP-based webservers handle 100-500 requests per second - just remember, if they're all sharing the same dataset, your database will still get hammered with volume.

Beyond that, keep as little state as possible, transfer it as few times as possible, etc.  Look into BerkeleyDB, Prevayler, etc. for data storage.  Try to find ways to make large sections of data read-only, or to make changes to some pieces of data not immediately visible, or something.
schmoe Send private email
Thursday, June 16, 2005
I'm not sure if Hibernate supports multiple servers. If not, your going to want to look at a different persistance framework.  If your not going to be releasing your product right away, you may want to look at the EJB 3.0 spec, since its very similar to hibernate.
Vince Send private email
Friday, June 17, 2005
dont stress over it at this stage, seriously.

if you become phenomenally successful then you can always move in more hardware or software as needed to keep everything ticking over until you can redesign the slow bits of your app.

The single most important thing you have to do now is *ship* a product that has a chance of being that successful

so worry about scaling when its a problem and not before.
That is, dont totally forget about it, but dont choose an alternative design approach that might slow you down, stick with what you know and can code effectively in for now, worst case there is *always* a way of improving things if you throw enough hardware at things.
Jesus H Christ
Monday, June 20, 2005
Some great advice here, thanks a lot everyone.
Monday, June 20, 2005

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz