The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.


In a prior post, Katie Lucas directed me towards using the epoll interface on Linux in order to manage a large set of descriptors.  In doing research, I came up with another question that, since it's related purely to epoll, I thought deemed a new thread.

My question is best posed with an example.  Suppose a server creates a listener socket, and then spawns two threads to handle I/O.  The thread procedure might do the following:

1.  Use epoll_ctl() to associate the listener (and clients) socket to a set of descriptors.
2.  Call epoll_wait() on the descriptor set.
3.  If the listener is signaled, accept a connection and add it to the descriptor set.  If the a client is signaled, service the socket.

The problem is that if a descriptor (the listener) is part of two sets of descriptors monitored by epoll, BOTH threads are woken from epoll_wait() when a new client becomes available. 

Would one thread accept and the other error out on its call?  Doesn't it seem like a better idea to only wake one thread?  I'm fairly certain that this is what Windows does.

I'm not sure how I should approach this.  Can I:
1) Count on one thread's accept returning an error.
2) Create two listener sockets on the same address/port with SO_REUSEADDR so that they are actually different descriptors? (Would this even work)
Tom Send private email
Thursday, December 20, 2007
"Would one thread accept and the other error out on its call?"

Yes. You'll get a socket on one and probably an EAGAIN on the other.

"Doesn't it seem like a better idea to only wake one thread?  I'm fairly certain that this is what Windows does."

Well. Maybe. But it doesn't..

"I'm not sure how I should approach this.  Can I:
1) Count on one thread's accept returning an error."


These are called "racing accepts". If you've really got bursty high rate of connection attempts and you want to deal with them all, it will mean the connections are accepted in a reasonable space of time.

However, normally in this case what you do is use a set of threads, each of which is blocked on the accept call (rather than trying to do this using non-blocking code).

Then when a new connection arrives, exactly one thread will complete the call (the others will remain blocked), and return with a new socket. That way you can snatch the socket off the accept queue in the minimum of time (leaving space for a new incoming one).

Then it can do the connection setup (whatever else needs doing for it) and then passes the connections off to an epoll loop to do the read/write work with.

"2) Create two listener sockets on the same address/port with SO_REUSEADDR so that they are actually different descriptors? (Would this even work)"

No. That won't work. The second bind will return EADDRINUSE. A listening socket is uniquely identified by the ip address and the port number and that pair MUST be unique. Yes, that does mean you can bind to the same port on different network interfaces which may or may not be useful.

That's not what SO_REUSEADDR is for. Without it set, sockets are "blocked" after being closed (hey're in a state called TIME_WAIT). The intention of this state is that packets in flight somewhere addressed to the previous binder won't be erroneously delivered to you. It happens to comms sockets as well, but you don't notice because when you call "socket()" you get a new port/ip tuple, with the port part ticked up. If you ever wrap the port numbers in less that the wait time, you might hit TIME_WAITing sockets, but that's not that likely.

Some unixes (but not linux) have SO_REUSEPORT which DOES allow multiple processes/threads to have the same ip/port tuple bound provided they all have the flag set. What happens there is that the connections are delivered non-deterministically to one of the set of listeners.

But if you have a single app, you get almost the same results from multiple racing accept() calls. {The difference being that you only have one instead of several accept queues. But you can just lengthen that.}
Katie Lucas
Friday, December 21, 2007
I think I understand, but I'm confused on one point:

Can I use a blocking socket with epoll()?  Because it sounds like in order to avoid the accept race, I need to make it so.  However, if I can't use a blocking listener socket with epoll(), then I'm not sure how the thread could proceed to do other work on existing client sockets in the case where no accept occurs...

My Windows backround might be confusing me.  On Windows, you can't use blocking sockets with asynchronous I/O calls.
Tom Send private email
Friday, December 21, 2007
You'd do a division of labour. You have accepting threads, which are just running racing accepts on blocking bound sockets.

They then set up the connection, do the initialisation and hand them off to workers. They can (conveniently) just insert the connections directly into the epoll set while the worker is waiting on it.

Typically the data pointer in the event block points to an interface instance, so the worker just calls "doSomeWork()" on it or picks from "readData()" and "writeData()" and there's a "whatEventsDoYouWant()" which returns the events to watch for.

So your accepters are just doing;

  sock = accept(fd);
  interface = createConnInterface(fd);
  pick a worker
  add the fd and the interface into the worker's set

And your workers are just going;

  for each event record returned,
    if EPOLLIN then interface->readData();
    if EPOLLOUT then interface->writeData();
    change set events to interface->whatDoYouWant();
    if interface->areYouFinished()
      Take it out of the set
      delete interface
Katie Lucas
Friday, December 21, 2007
I wasn't sure if it was safe to add an fd to an epoll set while another thread is waiting on it.  Of course, as it occurred to me while in the shower this morning, the definitive answer is available in the Linux source code.

Rather than bugging you (Katie Lucas) do you know of a good reference on this topic?  I've found most of the documentation to be sparse.

As an aside, after posting my last question, I wrote a simple program that created a blocking listener socket and added it to an epoll set.  So perhaps it IS possible to have a blocking mode socket in an epoll set?  Again, some specific documentation (beyond the man pages) would be useful here.  I guess it's time to go Googling...
Tom Send private email
Saturday, December 22, 2007
Follow up:

I actually decided to walk through the linux kernel source code to see if my worries had merit.  I think that the answer is NO.

In fact, my original question "is it safe to add to an epoll set from one thread while another thread is calling epoll_wait" is sort of dumb in retrospect.

I would think that no syscall would *ever* let a user potentially hose a kernel data structure (like the table that is used to hold descriptors in an epoll set).  It would simply be a recipe for disaster.

It appears to me that modification of the epoll set in the implementation of the system call is protected by a spin lock of some kind.  I'm no great expert on the kernel, but it looks that way.

Thanks again for your help on this.
Tom Send private email
Saturday, December 22, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz