The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Anonymous Usage Statistics: Best Practices?

A quick Google search suggests that a lot of applications collect anonymous usage statistics.  What I don't find is much discussion on this from the programmer's perspective.  Is each application rolling their own logging and reporting system?  For reporting usage data, how often is too often?  How much is too much?  Does anyone have any lessons they wish they had known before they implemented such a system?

What about anonymizing?  It seems like it would be straight-forward, but the AOL search fiasco suggests that it's harder than you might think to totally anonymize data.  Are there any guidelines for this?  Does anyone go beyond the obvious in their application?

Also, I realize that some people have strong feelings against this practice and I understand where they are coming from, so for my project it will be strictly opt-in.

For this discussion, I'm more interested in desktop applications - where the data is collected client-side and shipped to a reporting server - than I am in web sites or web applications.

Brian McKeever Send private email
Tuesday, September 02, 2008
Overall, I find them a waste of time. Any anonymous metrics I collected, I'd have to treat as suspect. Anything I did to validate those metrics would remove the shield of anonymity.

There is the option of collecting data about the user and then latter sanitizing the data so it is anonymous but that potentially exposes you a lot of additional problems down the road. At the very least, you'd need to invest a lot of time in scrubbing the data and making sure that's all other people saw, time away from other aspects of your business.

It's worth noting that Google considers anonymous usage statistics a core part of their search engine business.  While Microsoft has collected anonymous statistics in the past, they've been crucified so many times in the press that I doubt they do it covertly anymore; they want the legal and ethical protections of the opt-in mechanism.

Best practice: collect the data if you want but don't bother making it anonymous unless your business model depends on being able to resell it, and you've figured in the costs of doing so.
Wednesday, September 03, 2008
Err...  I suppose I should clarify.

Managers have the tendency to think collecting any and all data is a fantastic idea; they can later come back and data mine it whenever they have a question about their customers. I just believe the practical implementation of such a practice is so difficult that this is one of those situations where you're better off not trying unless you have a very exact, explicit need.

In addition to the ethical and legal aspects, you also have to deal with the following situations:

Firewalls, antivirus software and other cyber-security policies.
Multiple users on the same machine, sharing an account and not.
Interruptions during data transmission.
Non-standard installations.
Additional complexity in your program and subsequent bugs.
Deliberately bogus data (industrial espionage).
Security vulnerabilities in your server.

And that's just off the top of my head.
Wednesday, September 03, 2008
Thanks for the response.  I hear what you're saying, and I've been beginning to wonder if it's worth the trouble.  I know MS does it for Office (see e.g. ), and they seem to think it helps them build more usable software.  Come to think of it, Office is one of the few success stories I've read.  I know e.g. Winamp collects similar data, but I don't remember hearing anything come of it.  I wonder if they don't find it valuable, or have trouble gathering useful information.

I'm kind of surprised that this thread hasn't generated more discussion - I had thought more people would have experience with it.  Maybe it's just a few high-profile projects that bother even trying to gather this data?
Brian McKeever Send private email
Thursday, September 04, 2008
If it's not too late to chime in, I think quite a few products start out this way and then later abandon the attempt. Writing the code to open a socket and pass along a message is fairly trivial and you can get it out of any web programming book. It's really designing the infrastructure that makes people second-guess the effort. Microsoft has the luxury of throwing money and staff at the problem until they get something that works.

And I agree, it's worth doing for Microsoft Office since that's one of their few moneymakers and also one of their more contentious products. They'd definitely would want some sort of feedback mechanism instead of trying to guess things about their users.

It would be interesting to see a white paper discussing the real costs of phoning home.
Saturday, September 06, 2008
Incidentally, I'm currently working on a product that provides the whole infrastructure for collections anonymous usage stats from desktop software: an embeddable library for linking with the host application, a server to listen for messages from the clients, and a whole slew of charts and graphs for drilling into the data.

Here's the GUI design mockup:

And here's an article where I describe the concept:

I think the market is huuuuuuuuge, so I've been working really hard over the last few months to roll out a slick, compelling solution.
BenjiSmith Send private email
Sunday, September 07, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz