The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Optionality of XML Attributes

An application I'm working on can export data to an XML file. Certain objects within the application can have an optional string description associated with them.

Do you think I should always include the description attribute in my XML, even though it may often be set to an empty string, or should I only include the attribute when it actually has a value?

I can see that always including the attribute makes things a bit easier for the consumer of the XML (whatever that may be), but at the expense of file size.
John Topley Send private email
Wednesday, January 17, 2007
My opinion: Output it only when it has a value.  The schema describing the document should have minOccurs="0" to show it is optional.  The default is maxOccurs="1".
Wednesday, January 17, 2007
Oops - typo - I meant to say the default value for minOccurs="1" (I said maxOccurs by mistake in my last post... sorry)
Wednesday, January 17, 2007
I completely agree with OneMist8k... why bloat the xml.
Wednesday, January 17, 2007
+1 for output only when it has a value.
Wednesday, January 17, 2007
Thanks, that looks like the way to go.

I haven't currently defined an associated XSD, do you think it's important? The XML export is just a way for people to get their data back out of the app, it's not some important architecture astronaut piece of plumbing between existing systems or anything like that.
John Topley Send private email
Wednesday, January 17, 2007
If your XML is simple enough that you can verify it's correct by taking a quick look at it, there may not be much point. For anything beyond that, I like to have an XSD. There's a bonus for your users in that the XSD is a very handy bit of documentation if you're trying to work with someone else's XML.

I think they fill the same niche as unit tests - if it's the kind of thing that you wouldn't bother writing a unit test for, you can probably get away with not having an XSD.

(If XSDs were less of a PITA to write, I'd just say go ahead and write one. Is it just me, or does anyone else find them more annoying than necessary?)
Wednesday, January 17, 2007
.xsd's can be a PITA, for sure.  When I have to create them, I use a utility to generate them from an existing .xml, then go in with an editor to tighten it up.

Why xsd's?  (1) They are documentation, and (2) Can be used to validate xml documents.  In many cases it keeps distractions at bay so I can continue working.

The worst kind of .xsd's are those terribly annoying Russian Doll scenarios that have schemas within schemas within schemas.  Annoying as hell, but still useful.
Wednesday, January 17, 2007
I think I'll use a tool to generate an XSD and then hand tweak it. Thanks for all your responses.
John Topley Send private email
Wednesday, January 17, 2007
Friends don't let friends use XML...
Steve Hirsch Send private email
Wednesday, January 17, 2007
>>Friends don't let friends use XML...<<

You would rather suggest some kind of ad-hoc proprietary format that the consumer has to go write a special parser for?

Ok, sometimes that _is_ a better fit, especially when the the data is some large binary stuff, or specialised in some way that makes XML inappropriate.

I reckon for say 60% of general stuff however XML is going to be the most convenient solution. Where the data is not in a table format (so CSV is not suitable) than its probably best for 80% of cases.

The joy of XML is that its a lot easier for 3rd party software to extract data , for example using standards like XPath, and its a lot easier to import it into programs you write yourself as you can just use an off the shelf parser (like Xerces) to read the file, so you can concentrate on using the data rather then parsing it.

You can also make use of software that converts the XML to other formats using XSLT (or proprietary mapping techniques) - that sort of thing is something you see a lot of in B2B integration scenarios where data goes across the wire as XML and gets converted by the middleware (for example Biztalk) at each end to proprietary formats used by the respective backend systems.

As for other commonly recognised formats CSV is really nice and simple to both read and generate and has great support from thrid party apps, but its not a good fit for non-tabular data.

Property files can do hierarchy using some kind of dot notation in the keys, but are a much better fit for a flat set of key=value type stuff and ini files are very similar but also have sections however Id say XML is still a better fit for anything that has a lot of hierarchy.

Getting back to the OPs question, Id recommend omitting the attribute if its value should be viewed as null and using an empty attribute if your software views it as having a value of "" and it matters that that is viewed differently to null.
Thursday, January 18, 2007
Oh, and as for file size, well XML is pretty greedy when it comes to file size, so if that is a real concern there are probably two ways to approach it.

First way is to go with your own proprietary binary format though that has all the obvious implications wrt to inconvenience to the consumer of having to implement the necessary parsing. Id not recommend this unless your XML gets incredibly large.

The other approach is just to compress it when not in use. XML zips pretty well in general as there is a lot of redundant stuff that the compressor can factor out.

In many cases the size concern is actually a function of needing to send it over a network.

What you will often see in a B2B scenario is the middleware transmitting the XML in a compressed format and uncompressing it at the other end before passing it off to the consuming software.
Thursday, January 18, 2007
"I haven't currently defined an associated XSD, do you think it's important?"

No. If you are just going to generate it from your XML, what use is it? Let the other guy who wants it generate it himself. Don't listen to the architecture astronauts.

BTW, IMHO having attrib="" does not make it much easier to process, not worth the extra guck.
another guy
Thursday, January 18, 2007
For use as a data store, XML languages are usually terrible (and let's be clear, to say that you "use XML" is like eating a cookie cutter). It all depends on the schema or DTD you create, of course, but there are some commonalities for the languages.

One, they're hierarchical data structures; if you start nesting at all deeply, you start to get in trouble. Two, they're incredibly bloated. I ran a test on some real life, relationally structured data, and there was a 43x jump in file size when I went to an XML language.

Three, DTDs are difficult to read and schemas are worse.

XML languages are not bad options when you are using them as a document store; that's really what they are intended for (i.e., markup languages)
Steve Hirsch Send private email
Thursday, January 18, 2007
This is actually easy to answer.

Do NOT put stuff in the XML file that isn't neccessary. If the emty string is the defaultvalue for the attribute then declare it as a defaultvalue in the schema. A validating parser should handle these cases and give the user the defaultvalue if the attribute is not declared in the file.

And OF COURSE you should have a schema. Not writing a schema for an XML format someone else should handle is just lazy.

Just my 2 öre ( im swedish ;) )
Erik Rydgren Send private email
Tuesday, January 23, 2007
Having an XSD is valuable to your consumers even if they aren't validating. Lots of tools out there can read XSD and split out parsing code or object models if you give them an XSD. If you don't they're stuck dealing with the DOM.
Chris Tavares Send private email
Wednesday, January 24, 2007
+1 on making an XSD.
There are many binding frameworks which allow manipulation of XML data using object oriented models. These frameworks need to "know" what the XML will look like in order to have things work "smoothly". For those MS fans the .NET framework has a bunch of stuff built in specifically for this.

Given DOM methods (Node.setNodeText(1234)) or the typical binding methods (order.number = 1234), I'd never use DOM.
Friday, January 26, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz