The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Embed machine-readable data in XHTML

I'd like to embed machine-readable data into an XHTML web page. The purpose of this data would to act as input data to:

a) Either, JavaScript functions running in the client-side browser.

b) Or, a C# client application, which can browse XHTML and which will include pre-compiled subroutines for acting on the embedded data.

If the page is supposed to validate as XHTML, what format can I use for the data?

1) One idea is to:
  * Format the data as XML
  * Create a new XML tag to contain the data
  * Create a new DTD module that extends XHTML so that the new tag will validate

2) A second idea is to embed the data in XHTML tags: in a definition list or in a table, for example.

3) A third idea is to use a non-XML format (like which one: JSON?) for the data.

The normal way to embed data is possibly to include <script> statements which initialize JavaScript variables; I don't really want this, because the C# client application wouldn't know how to interpret these statements.

Do you have any comments or suggestions?
Christopher Wells Send private email
Saturday, February 10, 2007
Why not just have a second file containing the machine-readable data rather than embedding it in the XHTML.

a) Javascript can use AJAX to access the data in this second file/url.  Or you could format it as JSON and just use a <script> tag to include it.

b) A C# client application can access the second file from the URL directly.  You could even use a <link> tag to indicate where to get the file from.
Almost H. Anonymous Send private email
Saturday, February 10, 2007
Thanks, H, that was a nice, outside-the-box suggestion.

Having two files with two separate GETs would make it harder to serve: the data must match the content, so the server needs to generate them both from the same database snapshot.

I think I might put the data inside a <script> tag after all, as an assignment to a JavaScript variable; just parsing one assignment statement shouldn't be too hard for C# (and it's semantically correct, and means that the server side can be client-neutral).
Christopher Wells Send private email
Saturday, February 10, 2007

XHTML is just XML, and XML is meant to be extensible. You are free to embed any elements into XHTML, just declare it in different namespace than XHTML. Check out this article by Elliote Harold: (comments below that article are also insightful).

Sunday, February 11, 2007
Christopher, you've got two different goals here.  I'm assuming the C# application doesn't need to see the XHTML; therefore it could simply consume an XML resource on the server and avoid XHTML document altogether.  The Javascript, on the otherhand, is part of the XHTML page -- so use a <script> block with JSON.  There's probably little advantage (and some disadvantage) to having both the Javascript and the C# application access the same data in the same format.  Unless, of course, the C# app needs the XHTML part as well.
Almost H. Anonymous Send private email
Monday, February 12, 2007
> Peter

The article says "*XML doesn't have to be valid!*", and argues against 'embed the data in XHTML tags'. But not validating is contentious: instead, why not extend the DTD as suggested by the XHTML specification?


Yes, the C# app needs the XHTML part as well: the app is an XHTML browser and editor; the extra data (or metadata) is to define things like the list of context menu items for each type or class of element in the document, whether each element is read-only or editable, and so on ... the kind of data that in-browser JavaScript would consume.
Christopher Wells Send private email
Monday, February 12, 2007
"the app is an XHTML browser and editor"

Ahh, interesting.  I wasn't sure what you were doing with the C# part.  I was thinking perhaps it was a desktop rich client to mirror a corresponding web app.
Almost H. Anonymous Send private email
Monday, February 12, 2007
Easy. Just put your code in a CDATA comment directly in your XHTML web page:


The XML parser and browser will hide the whole thing visually, but it still exists in the page. How you parse that I guess is up to your c#. Make sure your doctype is a full valid xhtml doctype else, if the browser reverts the quirksmode or reads that you are not intending XHTML, it will display that block as any other text.

The advantage of CDATA is you can put whatever you want in there as the parsers are supposed to ignore whatever is there, yet your C# can still access the data as is.

- ranger
Tuesday, February 13, 2007
> The advantage of CDATA is you can put whatever you want in there as the parsers are supposed to ignore whatever is there, yet your C# can still access the data as is.

This is wrong! Parsers DON'T IGNORE CDATA sections, they just don't INTERPRET it! Following two XML fragments are equivalent:

<a>&lt;he &amp; llo&gt;</a>

<a><![CDATA[<he & llo>]]></a>

CDATA is only syntactic sugar, so that you don't need to do escaping.

Tuesday, February 13, 2007
By "ignore it" I think ranger meant its content will not be displayed - which is what the OP is looking for. Maybe he should have said "the rendering engine is supposed to ignore it and nothing inside of it will be processed" but the point is it's a good way to send metadata type stuff down.

You are very wrong in your definition of cdata, but it's also not a very good way to send data down unless you can avoid IE.

Since the OP is probably using the IE engine, IE's quirks in the way it handles cdata would be a major drawback.

I think processing instructions (<? and ?>) would be more appropriate, you'd just have to escape > characters and then presto, the data is not rendered but is accessible from javascript and C#. It's a hack, but it's a hack to get around IE's odd behavoir not a hack that uses non-standard methods of accomplishing your goal. With cdata you wouldn't have access to it from javascript from IE (mozilla works fine).

Either way, you can stick stuff in either cdata or processing instructions (so long as it doesn't have ]]> or ?> unescaped, respectively), and it'll be valid no matter what it is.
Pete Send private email
Tuesday, February 13, 2007
I disagree Pete. Purpose of using CDATA is the guy needs to store machine language code or characters. Everything in a CDATA block is ignored by the parser, period. I would be wary about storing code like that in a web page, unless its wrapped in CDATA.

Also, thats incorrect about IE. Internet Explorer 7 will ignore the CDATA just like Mozilla my friend. Try it!

- ranger
Tuesday, February 13, 2007
My bad...CDATA isnt "ignored", but in the XHTML and XML spec is interpreted as character data but in the browser is interpreted visually by ignoring the content.

Chris, you can actually send BLOBS down via CDATA and the XHTML will validate at teh same time.

- ranger
Tuesday, February 13, 2007
Essentially what I was saying was cdata is intended for it and would be the best way, were it not for IE's problems. In IE cdata content is not accessible from javascript (as it is in mozilla).

Try this out in firefox & IE:

<html xmlns="" xml:lang="en">
<title> cdata test </title>
<div id="asdf">


And then if you add a > inside the cdata section, IE completely goes bonkers, rendering everything after the > on the page (including the ]]>) even though it shouldn't make a difference, nothing is to be evaluated until the ]]> is reached.

So I was agreeing with you for the most part, and disagreeing with ... blank ... but I couldn't address my comments to ... blank ...

With a few caveats though, cdata is broken in IE, so he'd have to escape certain characters that would otherwise be perfectly fine (assuming he's using the IE engine).

Also, he can't just stick a blog in there, unless he's sure the blob doesn't conain any instances of ]]>


Using a processing instruction is a worse hack than I thought though, so don't do that either...

Using cdata and escaping >, then unescaping it in your C# would get you around IE's problems with minimal hackage, and your xhtml would be valid. It will be invisibe to javascript, though.

>the app is an XHTML browser and editor

So long as you control the client, why not just use a custom HTTP header? Then it's in the same request but not part of the xhtml document. Your embedded data goes straight to C#, and your browser never has to worry about it.

Seems like all these "normal" methods are meant for cases where you have no control over the client (the user's browser).
Pete Send private email
Wednesday, February 14, 2007
It's not *quite* the case that you can't get to CDATA from within Javascript in IE.  You can't get to it through the HTML DOM.  You can get to it through the XML DOM just fine, though:

<body onload="alert(test.documentElement.text);">
<xml id='test'>
<![CDATA[ruth is > richard]]>

Of course that code won't work at all in Firefox (though it's not hard to tweak it until it will, since CDATA doesn't break Mozilla's HTML DOM). 

More to the original poster's point, the xml element isn't valid XHTML, and if you (say) put it in its own namespace, it won't show up in IE as a data island anymore.
Robert Rossney Send private email
Wednesday, February 14, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz