The Joel on Software Discussion Group (CLOSED)

A place to discuss Joel on Software. Now closed.

This community works best when people use their real names. Please register for a free account.

Other Groups:
Joel on Software
Business of Software
Design of Software (CLOSED)
.NET Questions (CLOSED)
TechInterview.org
CityDesk
FogBugz
Fog Creek Copilot


The Old Forum


Your hosts:
Albert D. Kallal
Li-Fan Chen
Stephen Jones

XML vs CSV

I guess I shouldn't, but lets start that topic again: The rant yesterday about binary XML included remarks along the line "give me csv anyday, xml stinks".

Do people like that live in the same world as I do? How do you encode hierarchical data in csv? how do you nest more than 1 level deep in ini files (yeah, I now, look at that regedit4 format...)?

How do you encode a newline in csv? You think you know the answer? Think again! Because csv just isn't a defined format. Hell, Excel doesn't even open a csv file properly if you regional settings have the list seperator set to anything other than ",".

What happens to funny characters? Do you live in america, where ascii is sufficient for every single possible character ever to arise? Have you ever had to deal with umlauts?

How do you encode the list separator in csv?

What makes xml great is that it allready exists. With tools to manipulate it. If all I need is a small (!) bit of info, well, do I *really* want to write a parser for it???
Daren Thomas Send private email
Friday, January 14, 2005
 
 
How do you validate csv?  There are no DTDs or shemas or datatypes etc.

How do you extend or change CSV files without breaking all the clients that supply data in CSV?  With XML you can just add a new node anywhere you like and it will not affect a single client.

If you have 200 seperate customers supplying you data in CSV and 200 in XML with a schema I guarantee you'll have far less headaches recieving, maintining and developing the XML than with CSV.

CSV is good for one thing, sending people data dumps from SQL to open in Excel.
Jupiter Moon Send private email
Friday, January 14, 2005
 
 
XML sucks.

YAML is cool.
Gerald Hammock
Friday, January 14, 2005
 
 
I made a breakthrough: I discovered that different data formats are best for different types of data.

CSV files do have a use - they are a very easy way of accessing flat record type data. This is what most business data consists of.

Our product does data mining on very large data sets - the easiest way to read every record in a set with 10^9 entries is to use a separated text file.
Use tab or the ascii unit separator to avoid embedded commas.

However the heirachy of (possibly recursive) steps to process the data is stored in an XML file.

ps.
This reminds me of similair rows about objects 10 years ago - with people suggesting that you should encode video as a series of pixel objects contained in a line object contained in a field object contained in a frame object contained ....
Martin Beckett
Friday, January 14, 2005
 
 
> ps.
> This reminds me of similair rows about objects 10 years ago
> - with people suggesting that you should encode video as a
> series of pixel objects contained in a line object
> contained in a field object contained in a frame object
> contained ....

Funnily enough this happens to be the object model I am using.  Extremely useful for the kind of operations I am performing on my pixels in my frames.

Luckily my serialization code does keyframing and my compiler does packing and somehow everything ends up on disk in a much more compact format ;-)

Sorry, was I confusing logical models with physical models? ;-)
i like i
Friday, January 14, 2005
 
 
>> CSV files do have a use - they are a very easy way of accessing flat record type data. This is what most business data consists of.

Only because people make it flat, think about the most basic piece of business paperwork, an invoice. An invoice contains customer details and item details, already the data isn't flat.

I've worked on quite a few projects where people have tried to get hiearchical data into flat files and it's always messy. Give me the supposed complexity of XML any day.
Tony Edgecombe Send private email
Friday, January 14, 2005
 
 
What is CSV? There is no CSV. "Comma-separated values" has no single definition you can sit down and implement. Some guy escapes commas one way, Microsoft does it another way, etc.

Does XML suck? Yup. I don't care, I get to use sexps, and when something needs XML, I convert it. Sexps are probably more powerful than XML, because it supports data types other than text, and no impedance mismatch because I use it straight in my sourcecode.

But I've also used meshugga formats like vCalendar and vCard. Which people just invent because they're so damn creative. I'd be a lot happier with XML. If size is too much, that's why God invented data compression and in-memory representations. All I know is the loser programmers before me didn't hook onto good ideas in a mature manner, so I'll just take the least-worst.

Now, if politics weakens the House of XML, that can be a good thing too. But right now in computing politics, XML is a shield as well as a cage.
Tayssir John Gabbour Send private email
Friday, January 14, 2005
 
 
In my experience (warning: I'm biased), firstly CSV is mainly used for the type of data exchange where you have big long lists of things like transaction records. In this case, you don't have hierarchies and you just get on with it and batch process the rows.

Secondly CSV is also used to interact with spreadsheet data. There are many sites that make statistical data available via CSV (even Google, shock horror!).

CSV is a kind of "format of last resort" - for example you can always export your contacts to CSV and get your new Email/Calendar/Social Software/Palmtop to import them.

But you won't catch me saying RSS should be CSV or anything silly like that. I'm a much bigger fan of XML than CSV, but I believe both have a place.

It's a case of horses for courses...
Richard Rodger Send private email
Friday, January 14, 2005
 
 
Yes, CSV is fine for encoding data that comes in columns.  Really good for this, actually.

To pretend that XML is a 'one-size-fits-all' solution takes us back to the "I have a Hammer, EVERYTHING's a NAIL!" philosophy.

Sure, if you need to encode hierarchy, CSV runs out of gas real quick.  For you 'CSV is my Hammer' people, of course it can be done, but XML may have the simpler solution there.
AllanL5
Friday, January 14, 2005
 
 
a qustion on csv:

how do you extend it?

do you use the first row as the header identifying all the columns?

if so, does your software really take notice of this?
mb Send private email
Friday, January 14, 2005
 
 
"Use tab or the ascii unit separator to avoid embedded commas."

What is the ASCII unit seperator?
Mr. Analogy {ISV owner} Send private email
Friday, January 14, 2005
 
 
just for the record ... I'd wish more people would use XML. There is no standard CSV format as others mentioned.

I think a problem is that everyone uses Excel and you can import/export the CSV so easily.

But then I wonder if it's a question of more than one move ahead. dimensions. I think some people find that difficult to grasp. They all want to live in a flat world.
me
Friday, January 14, 2005
 
 
It's really matter of problem domains. Sometimes you can get away with CSV (or rather, special character delimited data in general). It's given--it's not universal and it's not friendly. Hell, pretend all you want, there are actually things swimming in most systems right now that can't even serialized to xml in a way that's condusive to processing in the data center--doesn't mean xml sucks.
Li-fan Chen Send private email
Friday, January 14, 2005
 
 
Li-fan,

Actually, I've been surprised at the high take up of XML, with respect to CSV. There are some on the net who complain that people use XML too much. It's happened in a relatively short time in computing terms, especially when you consider something roughly analagous, like the amount of data still stored on hierarchical databases.

Still, I agree it's a pain when things don't use it (has been for me).
Keith
Saturday, January 15, 2005
 
 
Apologies... That was a response to the previous poster :)
Keith
Saturday, January 15, 2005
 
 
Why not normalize the repeating groups to another flat file? Why does it matter how many files the data lives in if the computer is doing the processing?

Saturday, January 15, 2005
 
 
XML really does stink -- you can read about the final proof here:

http://www.theinquirer.net/?article=20868
Marko Topolnik Send private email
Tuesday, January 25, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz