The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

How to render HTML within a web page?

Hey all,
Does anyone know how many web sites (like web-based email clients, such as Hotmail, Gmail, etc.) render the HTML emails within the web page. I suspect they do some sanitizing or something, but it seems very non-standard I couldn't find much about it.

Certainly, they wouldn't just throw another page embedded in theirs, would they?

Any links or advice is appreciated.

Thanks,
Ian
Ian Sefferman Send private email
Wednesday, May 11, 2005
 
 
"I suspect they do some sanitizing or something, but it seems very non-standard I couldn't find much about it."

I suspect they strip out anything that involves scripting.  So all <script> tags, all event handlers, and all links that begin with "javascript:".  HTML email also rarely contain CSS so it's probably also good to strip out all CSS and all style and class attributes.

"Certainly, they wouldn't just throw another page embedded in theirs, would they?"

Nope that would allow for cross-site scripting attacks because any script code would run in the context of the host site.
Almost Anonymous Send private email
Wednesday, May 11, 2005
 
 
Oh yeah..  and you'd have to strip out <iframes> and other tags as well.  It's probably best to work the otherway, strip everything except for specific tags and attributes you want to keep.
Almost Anonymous Send private email
Wednesday, May 11, 2005
 
 
So, basically, what you're saying is that it's a very imperfect science which is based almost solely on trial-and-error?

i.e.: strip tags, see if the page still looks okay, strip more tags, see if pages still look okay, repeat..

Hmm, kind of lame, but then again, I don't see any other way of really doing it.

Thanks for the advice,
Ian
Ian Sefferman Send private email
Wednesday, May 11, 2005
 
 
"So, basically, what you're saying is that it's a very imperfect science which is based almost solely on trial-and-error?"

Not really.  Strip the dangerous tags/attributes and leave everything else.  It has nothing to do with whether or not the page looks ok.  If you strip dangerous tags/attributes and the page doesn't render correctly well that's just tough.  You can't just leave the dangerous tags in there.
Almost Anonymous Send private email
Wednesday, May 11, 2005
 
 
"Hmm, kind of lame, but then again, I don't see any other way of really doing it."

I'd create a quick a dirty parser of HTML tags -- nothing fancy, just something that can break down each individual tag and all the attributes.  This is a good task for regular expressions.  There is need to maintain the structure of the document -- just process it linearly.

Then you have a whitelist of tags and attributes and any tag not in the whitelist is removed and any attribute not in the whitelist is removed from the individual tags.  That should be good enough.

Go through the w3schools site and look at all the tags and attributes to create your whitelists.
Almost Anonymous Send private email
Wednesday, May 11, 2005
 
 
I agree. I conceptually understand what you're getting at. And I totally concur with the fact that security is paramount.

But, at the end of the day, users are going to *use* my application. And while security should most definitely come first, I want my users to actually enjoy using my program. So, I'm going to make sure it looks good.

I think it's quite missing the point of building something for people to use if you totally disregard how they're going to use it. :-/

Ian
Ian Sefferman Send private email
Wednesday, May 11, 2005
 
 
As well as security, remember privacy as well.  Links to offsite files such as images, css, js, etc. can be used by spam to comfirm that a address is valid and likely being read by a real human.

On another node, it may well be that HTML email is actually *easier* to read if you strip much of the formatting, as with Thunderbird's "Simple HTML" display mode.  Mail will appear in the same font, size, and colours, making your app look more consistent and likely prettier.

Wednesday, May 11, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz