A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.
I've never programmed using HTTP, and have some questions.
First, what do you do about duplicated POSTs? If POST has a side-effect (e.g. "debit my account"), how should you ensure that the same POST isn't sent and processed twice (e.g. by the user's pressing POST/Cancel/POST)? I've seen web sites which display a message that says "Your request is being submitted, this may take a while, please don't resubmit". What about having a unique ID to identify each transaction, and if so might this ID be better generated by the server (which sends the form to the browser), or by the by the browser (which POSTs the form back to the server)?
Second, will a browser ever retransmit automatically?
A web browser shouldn't ever re-submit a POST without first asking the user for permission. However, the user will probably say "yes". A proxy server also shouldn't re-submit a post, but you never know.
The most common solution is a transaction id of some sort as you suspected. Record the id you sent to the client. If an unexpected id comes back or if an id comes back a second time, reject it with a server error.
It is important to reject unexpected ids. If a repost is done after a session has expired, it will stop the double-transaction. It will also catch evil proxy servers. An alternate solution is to save all transaction ids in a database for a long time instead of putting them in the session, but that will probably be difficult to scale.
Since the days of Philip Greenspun web libraries like ACS have solved this problem. It's a real problem. Philip explained it pretty well back then so I won't rehash it. But for everyone's benefit I'll detail it in pseudo-code.
Basically you remember as much detail about the db call that will result from your click into a hash tree, and watch for recurring calls within a period of time, say 10 to 15 seconds. And you use this checking mechanism for any call where it really shouldn't be called over and over like this.
The function call is db_memoire or something like that.
What db_memoire does is:
1. record all unique recent db query strings for which you want no-clumsy-paws rule applied to. Usually this function call will store whatever relevant information there is, including the db query string, into a self-managing FILO hash tree. The key is the db query string. The value is the timestamp of the most recent time the call showed up.
The upkeep is self-managing because db_memoire itself does the FILO part.
Good places to hold this hash tree includes:
the web server's in-memory cache
the shared (by all web servers involved) centralized db
some application server dedecated to storing application-wide state like this
In most web stacks this is really straight forward and already there. I think Java Servlets, AOLserver, ASP 3.0, ASP.NET, mod_perl, mod_anything_really, they should all have it. This hash tree should be available application wide logically. So it can't just sit in one webserver if multiple webservers host your web app (the app in needing of clumsy paw management).
Such a hash tree is usually relatively speedy at look up and persistant for all eternity*. If it's over TCP it's slightly slow but it won't hurt the reaction time of your application quite like an RDBMS so no worries.
That's one tried and true way to catch a dup.
Here's the pseudo code:
1. So you have a use case where a clumsy paw might click twice on the Send button (insert into tbl_outbox(ixSenderID, ixRecipientListID, blobMessageBody, etc) values (1, 1, 'spam spam spam', etc) on the web mail page (module name of webmail_module). It is unique to this current logged in user (say: lifanchen) and is to be unique only within this web app since this is a shared web server with multiple web apps (its_not_the_size_that_counts_fish_hook_estore_v_1_5).
2. So you make a note of it by asking the hash tree is this hash key/value pair (as follows) is already there.
3 This is the sample hash key we'll use for the use case (((Syntax at end of this post))):
its_not_the_size_that_counts_fish_hook_estore_v_1_5|webmail_module|lifanchen|insert into tbl_outbox(ixSenderID, ixRecipientListID, blobMessageBody, etc) values (1, 1, 'spam spam spam', etc)|prevent_clumsy_paw
4 Once you have the key, check if there's a collision in the hash tree if you tried to insert it. If there's a clash do the following:
4a. It is a dup.
4a. possibility one
condition: If the current system time is more than the time reflected in the hash value by the clumsy paw window (15 seconds) than there's no problem. It's not a clumsy hand. Go to next step.
Action: Just Update the hash value to the current system time.
4a. possibility two
Condition: If it is within the window, then we have a probem. act on the action delegate. In our case the action delegate is prevent_clumsy_paw. Your code calling this function will get this string back so the application logic should be NOT to execute this db query. All you have to do now is go to next step.
Action: Just Update the hash value to the current system time.
4b. If there's no clash this is the first click of the clumsy (or possibly not-clumsy paw)
Action: insert the db query and the current system time into the hash tree.
5. This is the FILO upkeep part: consider the current system time, every hour (or as frequent as you like) delete all hash entries where the hash value reflecting time more than 15 seconds older than system time. [This could be a scheduled job (very easy to do in AOLserver) or just let db_memoire do it and flag it done every hour when it's done.]
Where the <memoire_string> is:
Friday, April 06, 2007
"Second, will a browser ever retransmit automatically?"
Yes! In particular, if the user is at a page that was returned from a POST request when they open a new window, Internet Explorer will repeat the POST in the new window without asking for confirmation.
Friday, April 06, 2007
> Sorry if there's any confusion about what I've written contact me.
* Use an ageing cache
* Use a key built from properties of the request (that's assuming no artificial ID, I suppose)
* Use the cache not only for 'clumsy paw' but for other purposes too
* I should know something of Philip Greenspun's.
Are there more specifically-HTTP (I'm not asking about HTML, which would be too big a topic) best practices that are listed somewhere? I've read the RFC but little else.
> doesn't really stop the double posting issue anyway
It eliminates one of the causes: if a POST returns a result (e.g. a purchase conformation page) the user is likely to bookmark it and revisit the bookmark.
Well session management prevents those from being a problem. You may have a bookmark to a form action, but if the application is wrapped in application level auth/auth you'll be redirected to the login-screen anyway.
That takes the solution 50% of the way.
The other 50% is your application logic, once the user has logged in your web app recognizes the business states and business logic surrounding a profile's various business transactions. So the business logic would naturally protect against duplicate inserts, deletes, updates, and so on.
db_memoire is for an earlier time without such business flows. Or if you want to do REST like work without heavy-weight business layers.
Say you have a AJAX form that will derive answers from fresh db queries only every minute, but some idiot save-to-disk the form and modify the refresh rate to every 50 microsecond. db_memoire will save your hide.
And if you use a counter, you can ban the idiot too.
Another use of db_memoire is to use a tuple (timestamp|query_response_body). That way you won't kill your DB with repeat requests of the SAME QUERY. If you shape the hash key carefully this could mean millions of people could share the same query answers.
Before putting together solutions like this by hand, do please check out things like Danga's memcached and ASP.NET 2.0's facilities for caching. No need to Not-Invented-Here something already designed to scale and debugged to production quality for a song.
I've seen ASP.NET's APIs for cacheing.
I was wondering whether the generally-accepted practice is to have an artifical ID to identify each POST.
I'm also curious about PUT and DELETE (mentioned in the HTTP RFC and in articles about REST): if PUT and DELETE aren't generated by web browsers, what applications (in general and/or in particular) do generate them?
The general practice is to identify sessions (with cookies were possible, hidden fields or URL path or URL query string where not) and remember form states from click to click. So no, it's not really part of most web stacks to label each page generated with a unique id. That you won't really find in most web stacks.
You are however more than welcome to attach a unique id to the URL. Most web stacks will expose the URL for you to rewrite. Just do it for a good reason. For example adding unique strings to otherwise identical static forms hurts browser caching.
I think WebDav makes use of PUTS and DELETES. Rest puts some trust back into what was originally in HTTP because they feel it's silly to ignore it and put SOAP on top of it. Well having SOAP not care about the underlying protocol is a GOOD thing so SOAP didn't do that wrong. But REST advocates generally think SOAP is way too thick a layer--over engineered. They really don't care about that decision by SOAP not to use PUTS and DELETES.
Hardened web servers (Apache and IIS included) will lock down on PUTS and DELETES--restricting the HTTP handler to handle only GETS and POSTS. They are simply reducing accessible code service to potential attacks, probably on the idea that PUTS and DELETES aren't often used so may be more likely to have yet-to-be-exercised-or-recognized security bugs compared to the code path that handles GETS and POSTS. I really don't know. But you'll see it being done when you run hardening wizards. So keep this in mind if you want your REST web service to run on client servers--their server may have all the wrong things locked down--rendering your REST program unusable.
It looks like you are talking about web services implemented by eBay, Amazon and such.
I have seen weird cases when each web service invocation will be attached the username/password. Technically speaking that's okay, since it's freaking HTTPS anyway. And frankly you can hash tree the username/password so the look up of session or credential won't be a problem. But properly speaking where multiple login sessions might result from one possible pair of username/password you would think it pays to pass around the session id associated with the session (among N possible sessions). Web service calls are just raw http requests to the server, and the client is often a simple SOAP library, not a full blown web server. So you as the programmer will probably have more exposure ot the session key where as a web browser user will never realize there's cookies or sessions going on behind every click.
> So you as the programmer will probably have more exposure ot the session key where as a web browser user will never realize there's cookies or sessions going on behind every click.
Session are typically implemented using a cookie (or as a fallback by appending a session ID to the URL if the browser doesn't support cookies, which isn't as good because the user might share their URL with other users).
As a client application I'll probably just want to support cookies, so that the server can use them to manage session state.
The correct way to handle a post is to peform a 303 redirect to the results page. It will then not be possible to resubmit, because refreshing will refresh the results page, and hitting "Back" will skip the post stage and go straight to the original input form.
Thursday, April 12, 2007
Thanks for referencing "WebDAV" by the way; it's interesting to see how many applications use it: http://ftp.ics.uci.edu/pub/ietf/webdav/implementation.html
This topic is archived. No further replies will be accepted.Other recent topics
Powered by FogBugz