The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Satellite Latency and Java Client/Server App

Has anyone ever experienced problems with satellite connections and a java client/server app?  We're using RMI and a custom server over tcp/ip to do direct communications from the client and server. (We're not using a commercial app server of any kind).  This is a mature production application running about 5 years.

The satellite customers are experiencing intermintent problems.  In a few cases they can go a few hours without problems, and in others they'll start having random communication errors. In each case they are reporting they are not experiencing other internet problems, they can browse the web and check their email but our app is failing. The clients on cable modems and dial up are fine with the application, they're not experiencing any type of connection difficulties.

I know satellite users have problems with most type of streaming applications, VOIP, video conferencing etc. but not with packet based communications.  As I understand it the latency is a problem and 80 percent bandwidth is dedicated to download and 20 percent upload.

We do not do any type of data streaming.  The data requests are for the most part user interactions (we do have a few polling type request/responses running for messaging purposes, but it's only getting sent every minute or so).  I don't understand why we're failing in sending out a 30K message from the server to the client for instance.  I'm not aware of any parameters in the URLConnection package that could affect this. 

Has anyone experienced similar problems or could provide a hint of why this is occurring?  Is there anything that could be done or should I just tell the users with satellite your SOL which I would hate to do.

Thanks for any responses.
Tim Y.
Wednesday, February 09, 2005
Drops in applications that aren't used to high latencies can often have cascading negative effect. Short timeouts, for example, just make things worse.

RMI uses TCP so data can take a long time to start sending again on a lossy slow connection. TCP assumes drops are due to network connection which isn't necessarily the case over a satellite. I found this:

      The characteristics of satellite channels, such as large latencies, path asymmetries, and occasionally high error rates, provide Satellite TCP operation with a challenging environment. Satellite TCP performance enhancement is required mainly due to two problems:

    * The regular TCP version needs a long time (generally more than three seconds) to reach the full recommended window size in satellite links, which can waste satellite
bandwidth in the slow-start phase of TCP evolution

    * The steady-state behavior of regular TCP cannot fully utilize the bandwidth provided by T1 satellite channels

I don't know what to do other than try a different transport or see if you can fiddle with your TCP network configuration.
son of parnas
Wednesday, February 09, 2005
Back in '98 I struggled with this problem.  We ended up rolling our own UDP-based protocol, so that missing chunks could be requested out-of-order.  Bigger buffers client and server side etc.  And sending dups as a matter of course etc.

That is the kind of expediant thing you do when you do when you are inexperienced ;-) It worked.

Investigate "wireless TCP" - might give you a more general google result than searching specifically for the term "satellite".

On a certain satellite network, the packet-size was 400 bytes or something, and the gateway was cutting up our packets transparently.  Before we worked this out, and configured our servers and clients to only send 400 byte packets, things were much slower!

Are you getting your failures on the up or the down?
i like i
Thursday, February 10, 2005
Thanks for all the replies.  To answer your question, so far the failures are only occuring on the "downs", the data transfer from our server to the client.  It might also be happening in the reverse but the clients who are having difficulties aren't able to get to the point in the program where they do need to send a larger amount of data back to the server.  For ex. standard workflow is client logs in, sends a request for data to be sent to the server, server responds (failure occurs here), client modifies data, and sends it back to server. 

From the replies I've read it seems as if the only solutions are either changing the protocol or changing the packet size of the tcp/ip.  Changing the protocol is definately out of the question at this point in the program's life stage, modifying the packet size might be something we could try in the future.

After speaking with my supervisor we've determined that it isn't in our best business interests to go further with it.  10 out of a 1000 customers are on satellite and out of those maybe 5 are having consistent failure rates.  He's decided to let the 5 go. 

Thanks for the replies though, at least I have a good idea of why it's occuring and give a solid answer to our clients, which is important to me.
Tim Y.
Thursday, February 10, 2005
If you want to really understand the problem as opposed to guess what the problem is you can use one of numerous systems to simulate network conditions.

For my current work I'm using the dummynet feature of FreeBSD which does most of what I need with good enough accuracy but without easy to use GUI. There are also similar systems for Linux, NIST Net and NetEm.

There are also commercial systems try to search for "Network Simulator".
Baruch Even Send private email
Thursday, February 10, 2005

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz