The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Tradeoff: Text protocol using Base-64 or binary protocol?

I am interested in writing a TCP-based server that potentially will have to move a reasonable amount of binary data (such as document files, etc,.) back and forth between the client and server. 

On one hand, I'd like to heed the advice I've read elsewhere on this board and design a simple, text-based protocol.  The arguments for a text-based protocol are that it is easy to debug, and pretty easy to implement without fear of endianess issues.

The negative side is that with a text-based protocol, I need to perform encoding of binary data.  I figured on using Base-64 (much like SMTP does) and was wondering what people thought about the efficiency aspects here?  Obviously, it has worked out pretty well for mail.

However, there are those who might bristle at the idea of wasting bandwidth with Base-64 encoding, preferring a binary format instead.

I'm siding with Base-64 and text, but would be interested in hearing what the many experienced people on this board have to say about it.

Thanks, and happy New Year
Anonymous Send private email
Wednesday, January 03, 2007
 
 
"...The arguments for a text-based protocol are that it is easy to debug, and pretty easy to implement without fear of endianess issues."

I think in the long run, people will follow whichever protocol you choose as long as it is well designed, well documented and relatively future proof. I mention this because I have no objection to you creating a binary protocol and I would not worry about the endianess as long as you provide a clear explaination of which bytes are intended to represent what.

Having said that, I think the real decision as to which format should be based on the data (text files verses images, sound and video), and the target platiforms. If you intend to provide a client and a server for Windows, you plan to pass around multimedia-rich Microsoft Word documents and you're not interested in supporting Linux or OS X, I actually would bite the bullet and use binary instead of Base-64.
TheDavid
Wednesday, January 03, 2007
 
 
Base-64 is adding a lot of unneccessary overhead, both in the amount of data sent down the wire and the processing at the end-points. I don't see the ability to view the stream as Ascii text a big enough plus to use this approach. Your protocol info will be buried in the data and hard to see anyway.

I wrote a tcp/ip protocol a few months back and went binary. I actually send all data as variants which has worked out very nicely and is easy to extend with new data types as needed.
Neville Franks Send private email
Wednesday, January 03, 2007
 
 
I second TheDavid's comment: it depends on how much data you are going to be moving around.

100KB - probably doesn't matter, maybe go with text/base-64 to ease your debugging efforts

100MB - probably best to use binary just to cut down the amount of data being moved around
A. Nonymous
Wednesday, January 03, 2007
 
 
"On one hand, I'd like to heed the advice I've read elsewhere on this board and design a simple, text-based protocol."

Is HTTP a text-based protocol?  It still sends binary data as binary.

There is no reason to Base64 encode anything over TCP unless you want to wrap it in XML.
Almost H. Anonymous Send private email
Wednesday, January 03, 2007
 
 
I once had to make this decision as well.  Since the first version of our software was written in C, and most of the data being transferred was in a binary format to begin with, it just made sense to use binary.

The protocol has evolved, taking ideas from ASN.1 and XDR and has proved to be very flexible and works between our components written in Java, C and Python.  I actually made a protocol description format in Python which then generates Python, C and Java code for the data marshalling.  I'm pretty happy with the solution.

We have had times where debugging some issues were hard.  What we did was wrote a protocol aware proxy server that could more or less decode the protocol into a textual representation to help in finding errors.  And yes, time is sometimes spent decoding hex.  But this was mainly before we had the code generation tool describe above.

If I were to do it again, I'd probably look at something like ICE from ZeroC.
Jason Send private email
Wednesday, January 03, 2007
 
 
That's a good answer.  To provide more detail, I'd like to be future proof enough to support Windows and Linux on both 32 and 64 bit platforms, and this is a not necessarily a platform for storing video or audio files, although that would be possible.

Perhaps what I need is a hybrid protocol in which most of the commands are text-based, but in special cases, binary could be sent.

For example, suppose that the request to upload a file looked like this:

 Sender:  Request to upload file in binary, 102322 bytes
 Receiver: OK, go ahead
 Sender:  .......
 Receiver: OK

In this scenario, only part of the transaction is binary, and the rest is text.  It carries with it the cost of an additional round trip to the server, but that might be worth it if the file is large.

Does anyone have a comment on this (or a different idea) in light of the new information I presented above?
Anonymous Send private email
Wednesday, January 03, 2007
 
 
Re: Variants

I'll have to say that I'm wary of any format that doesn't involve fairly well-defined data structures, whether they be raw data structures or structured text.

It has been my experience that excessive use of variants or arbitrary collections of name/value pairs leads to slop in designs.

I'm really waiting, though, to see what people have to say about my last post, where I describe a protocol that would allow both text and binary, the latter possibly being reserved for situations that would result in large transfers.

I knew I came to the right place to get a discussion going on these tradeoffs, though, thanks for the suggestions so far.
Anonymous Send private email
Wednesday, January 03, 2007
 
 
"It carries with it the cost of an additional round trip to the server"

In the above example, I doubt the additional round trip is necessary.
Almost H. Anonymous Send private email
Wednesday, January 03, 2007
 
 
Maybe it's more practical to think of the problem as encrypting the payload within the TCP/IP packet itself?

Offhand though, I can't remember how hard it is to reassemble packets into the correct order, or if one of the... what is it called, OSTI layers(?) does it for you?
TheDavid
Wednesday, January 03, 2007
 
 
TheDavid:

Not sure what you mean, but TCP/IP is a reliable stream protocol based (so I've read) upon UDP.  The TCP/IP stack handles breaking up the stream into packets, sending them on the network, reordering them on the receiving side, etc,.

Or, maybe you know that and I misunderstood you-- I was using TCP/IP in the strict sense-- some people seem to refer to any IP based protocols as being "TCP" but clearly that's not the case.
Anonymous Send private email
Wednesday, January 03, 2007
 
 
First question: why can't you do this with an existing standard protocol?

I mean, if you're using it to transfer files (binary or text), you could use FTP or HTTP.  If you're using it to transport messages, you could consider POP & SMTP.  If you're calling simple services over the web, go with SOAP.

Having worked with a range of proprietary & open protocols, I can say that unless you have a compelling need to do so - e.g. your competitive advantage is going to be functionality available only through your API - you should stick with something well-known & widely available.
Duncan Bayne Send private email
Wednesday, January 03, 2007
 
 
> potentially will have to move a reasonable amount of binary data

Run the data over a separate connection and then use text over the command channel. This is how ftp works.
son of parnas
Wednesday, January 03, 2007
 
 
Duncan:

That's an interesting idea, I'll have to look into it.  However, the example I gave of what I need to do is just ONE example.

son of parnas (a frequent poster here) has repeatedely made a similar point along the lines of "it's really easy to do something simple like HTTP"  so it's definitely worth looking into.

That said, there are a few requirements that make me think HTTP is not my answer.  For example, I think that my protocol will require a persistent connection, and as I understand it, HTTP typically uses a connection for one request/response pair and then dumps it.

I'll say that I'm skeptical of the ida of adapting my application's requirements into an existing protocol designed for something else, but I'll look into it!
Anonymous Send private email
Wednesday, January 03, 2007
 
 
Sure - only choose an existing open protocol if one exists that meets your requirements. 

I'm not saying there's anything wrong with creating your own protocol per se, just that if you can use an existing one, you'll be taking advantage of years worth of trial & error on the part of those who developed and adopted it before you. 

Plus, you'll make it easier for people to interop with your platform (whether that's a good thing or not is up to you).

Finally, existing protocols are suit-friendly.  Let's say you choose SFTP.  Imagine a user going to a network admin and saying "I need the Secure FTP port opened please", versus "I need port 12345 for my app you've never heard of from a uISV you've never heard of."
Duncan Bayne Send private email
Wednesday, January 03, 2007
 
 
Have you heard of BEEP?

It's a framework for constructing network protocols, so that you don't have to start from scratch.

I've never used it myself, but I've heard good things about it.

http://www.beepcore.org/
BenjiSmith Send private email
Wednesday, January 03, 2007
 
 
Anonymous,

HTTP 1.1 supports persistent connections, though I don't know much more about it than that. 

One big issue is to consider where your end points will be.  I know of one big app that ended up failing because it had to work across the Internet, and too many corporate firewalls wouldn't allow direct Internet TCP/IP access, but they WOULD allow HTTP traffic through the firewall/proxy.  So using HTTP will almost guarantee your app can work anywhere.

Also, your 2nd post of text - binary - text is a really simplified explanation of HTTP.  Look into it--it's very flexible.
PA Send private email
Wednesday, January 03, 2007
 
 
"Easier for me to program and debug" isn't much of an argument in favor of sending binary as encoded text unless you're doing throwaway programs or the data volume is small.

If you did this and had to compete against a binary transfer product they'd eat your lunch.

There is much to be said for lowering development costs and for having fewer bugs.  However programmers need to remember that doing it RIGHT the first time is effort that get amortized over the life cycle of the software.  I.e. spend the effort on development once and the users reap the benefits time after time.

In this case you've proposed unnecessary overhead in terms of both network bandwidth and processing node cycles and memory.  And for what?  "It makes my job a little easier - I think."

The difference in effort is not the great, and I've been doing rip and replace of this sort of thing for a few years now.  The performance penalty is greater than you think.  We've been seeing elapsed time improvements of as much as 5 times on long running operations.  In most cases we've actually reduced code complexity by a substantial amount as well.

Bytes are just bytes, there is no reason to fear non-printable values in a reasonably homogeneous environment.  Most standard Internet applications were text based for one overriding reason: different character codesets on different platforms in the much more diverse computing ecosystem of the 1970s and 1980s.
Old Guy
Thursday, January 04, 2007
 
 
I'm surprised nobody mentioned zipped text.
All the advantages of plain text with a smaller data size than binary ( unless you are shipping random numbers around )

You could even go fully buzzword compliant and use XML then compress it.
There are good free gzip libs for pretty much every language.
Martin Send private email
Thursday, January 04, 2007
 
 
Old Guy:

Thanks, I think you've helped curb my fears about a pure binary approach.  I thought about this more last night and realized that much of what I'll be pushing (in terms of raw volume) will be raw binary data, so encoding it will incur:

  1.  The cost of encoding
  2.  The cost of more data on the network
  3.  The cost of decoding

Someone else suggested that a simple debugging proxy server could be written to snoop on the messages and decode them in case problems came up. 

I will have to take a look into the frameworks suggested by others, since they might reduce the time it would take me to write that portion of the code.

Another great point made, though, was that existing protocols are "suit friendly" and additional ports might not need to be opened, etc,.  I have found this to be a very real concern of IT staff, mostly because they are not enamored with the idea of opening ports on routers and stuff like that.

This is a tough  tradeoff to make!

Martin:

I'm not sure, but I think I'd run into serious performance problems using gzip on the messages before pushing them up.  This *could* save bandwidth, but much of the data being pushed up might already be in a compressed format, which would negate the usefulness of gzip and instead just slow down the server. 

Also, I think that compression is much slower than decompression with gzip, which would mean that the server program would probably take a big performance hit.

If you know of any other apps that do this, though, I'd be interested to hear about them.  I'm just in the thought stage at the moment, and it's too early for me to discount a scheme because of my beliefs about how the program will operate.
Anonymous Send private email
Thursday, January 04, 2007
 
 
+1 for bytes are bytes. 

You need to stop holding the notion of "text" being something special and sacred.  "Text" is nothing more than a well defined binary format.  Which "text" spec are you planning to use for this protocol of yours anyway? UTF-8?  UTF-16 (big or little endian), ASCII?  I hear EBCDIC is the all rage for 2007, you know.

Expand your mind on this one. What is so speical about 8-bit bytes anyway?  Old school systems had 7 bit bytes, hell, I think there is a 7-bit clean way to do unicode.  Many PIC controllers have 12-bit counters, not 16-bit ones.

In the end, its all binary.  Everything else, even 8-bit bytes, is a specification riding on top.
Cory R. King
Thursday, January 04, 2007
 
 
Lots of good posts here. I'd just add that there seems to be two parallel discussions going on here: what communication protocol to use, and what data format to use for your application.

I'd add my two cents in along with many others here that if your data is generally a large amount of binary data, you're best off using its native binary format as the transport format.

As far as how to design the interaction between the clients and the server, you can either design your own protocol, or find a suitable standard that allows for transmitting binary data, and maybe supports some "plain-text" message meta-data for ease of debugging and future support. This sounds a lot like HTTP to me. As another poster mentioned, HTTP responses (and requests) are not limited to HTML text documents; after all that's how we get all these pretty pictures on our browser screens (well, not so many at joelonsoftware.com, but you get the idea).

Anyway, as others have mentioned, you may find it worth your while to dig a little deeper into the HTTP spec to see if this battle-tested protocol meets your needs.

Good luck!
Dov Wasserman Send private email
Monday, January 08, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz