The Joel on Software Discussion Group (CLOSED)

A place to discuss Joel on Software. Now closed.

This community works best when people use their real names. Please register for a free account.

Other Groups:
Joel on Software
Business of Software
Design of Software (CLOSED)
.NET Questions (CLOSED)
TechInterview.org
CityDesk
FogBugz
Fog Creek Copilot


The Old Forum


Your hosts:
Albert D. Kallal
Li-Fan Chen
Stephen Jones

Preserving Knowledge

I spend a lot of time surfing around the web and more than often I find some very usefull information. Especially here in the JOS forum there is a lot of really good stuff and lately I've begun regretting that I didn't save some of the cool threads and sites I've gotten around (somehow). It's really annoying to know that there was this site covering [insert topic] and now I simply can't find it through Google.

To combat this I'll begin to store the things that I think I will find usefull in the future, but how do I do this in a smart way?

Google is not a good idea as it will search among all the sites I've never been at aswell and that's not what I want (besides I usually have a snow mans chance in hell of finding the right website as I don't exactly know what it was about).

The usual bookmarking in browsers will not do either as the number of sites I'll store will be too big for the browsers to handle. 3 to 4 sites a day makes more than 1000 sites in a year and finding the right page among those will be tedious at least.

Does anyone have an idea to a solution of this problem? Is there a program that can help me out with this?
Peter Monsson Send private email
Sunday, October 03, 2004
 
 
Store them all and search the bookmarks...  Is this possible?
Skewer Target
Sunday, October 03, 2004
 
 
That sounds awesome, if no one suggests anything, somebody should build it.  I'd give $30 for something that works exactly as the OP described.
lumberjack Send private email
Sunday, October 03, 2004
 
 
Wouldn't it be possible to store/save the interesting
stuff with wget?
René Nyffenegger
Sunday, October 03, 2004
 
 
Gurker
Sunday, October 03, 2004
 
 
Try del.icio.us (http://del.icio.us/) or Furl (http://www.furl.net/).  Both are online link aggregators.  You can tag and search through links, both your own and the public links of others.  Very useful.

Furl gives you more control over sorting your links and also caches the page so you can still access it even if the link breaks.  In theory it's great, but I think the interface is too busy.  I almost exclusively use del.icio.us.  YMMV.
David Gingrich Send private email
Sunday, October 03, 2004
 
 
I store items in a wiki. I just cut
and paste. I wasn't so lazy i would
make this more automatic. But a wiki
is great for preserving info from
elsewhere and your own thoughts.
son of parnas
Sunday, October 03, 2004
 
 
I run all my web browsing through a local squid ( http://www.squid-cache.org/ ) instance that doesn't cache, but logs all page requests.  At night, I wget ( http://www.gnu.org/software/wget/wget.html ) all the pages and feed them through htDig. ( http://www.htdig.org/ )  I use an older htDig and haven't bothered to upgrade because the newest versions are significantly slower.

It requires a fast CPU and a _lot_ of disk (even filtering out jpgs, swf, etc.), but I can search any page I've viewed in the past.

I'm sure there are better options for each stage in this chain (or eliminating stages altogether), but I've been using this Rube Goldberg method for a few years now and it works ok.
Art Send private email
Sunday, October 03, 2004
 
 
You could get a Gmail account and email yourself all the interesting topics, links and ideas you accumulate during the year.

Then you have the fast searching of Google combined with only searching through your info.
Kent Send private email
Sunday, October 03, 2004
 
 
a9.com

and soon google and MS and yahoo

a lot of money is being paid to some of the best minds in the business to give you exactly what you want.
insider
Sunday, October 03, 2004
 
 
>Then you have the fast searching of Google combined
> with only searching through your info.

And google will know everything about you.
son of parnas
Sunday, October 03, 2004
 
 
In IE, you can just select File | Save As... and save the page you're interested in. I use this technique and select *.mht files, so each page ends up in one file.

You can use bookmarks, but you never know if the site will still be there later.
Nemesis
Sunday, October 03, 2004
 
 
I used to save everthing as .mht files until I found that about 20% of them later had problems and would not open. I've now gone back to saving them all as web-page complete.

The basic problem however is searching. Once you get a load of files on the HD it becomes as difficult, if not more so, to search for them locally tnan it would be to google for them. I know that there are web pages I can find quicker with Google tnan I can on my local HD. Nevertheless you could try using Lookout to index your My Documents folder
Stephen Jones Send private email
Sunday, October 03, 2004
 
 
There is a tool call PersonalBrain, see at http://www.thebrain.com . I use it exactly for the initial topic creator was discussing. It is capable of stroing web page shortcuts, shortcuts to files and Outlook messages. The only bad thing about it is that the company which makes it seems to have stopped supporting it.

You may be also interesetd in Omea, from the guys who made ReSharper - see: http://www.jetbrains.com/omea_reader/
Lev Kurts Send private email
Sunday, October 03, 2004
 
 
In IE you can search your cache. I don't think you can do this in FireFox, at least not the full text of the files.

I had a tool that did this, but it was so slow that I gave up on it. As much as I think this space is ripe for a good tool to come along and solve, you can search your IE history.

Actually, if you don't mind giving up your privacy a bit, someone could make a browser plugin that tracks the pages you've visited. Attach this to an existing search engine like Google, A9, Yahoo, etc. and you can "search within the sites I've visited."

A9 is probably closes to this as it keeps a history of things you searched for and links you clicked on, but it's not the same as the history of sites you've visited.

Delicious is a great tool for this. I discovered it by looking at my log files. It's a community / online bookmark thing. Not only can you keep all your bookmarks in one place, you can see what other people are bookmarking to as well. The most popular bookmarks is actually one of my destinations every day.

http://del.icio.us/popular/

As a previous poster does, I keep my bookmarks in a Wiki. If I wanted to, I could even store the whole text of some articles I was interested in, and it would allow me to search through them later.
www.MarkTAW.com Send private email
Sunday, October 03, 2004
 
 
Thank you all for all your answers. I'm trying to find the one that suits me the best.

HTMLDOC is not at all what I'm looking for. I don't want to convert HTML documents into pdf or ps. I want to find them.

del.icio.us and Furl have something of the stuff I'm searching for, but I'm not really keen on their "share with your friends"-idea. And if the company goes down? That's more fatal than a HD failure. I agree that Furl has a very busy interface.

A Wiki is a nice idea, but I'll have to customize the wiki then and I'd rather buy a product more specifically designed to combat my problem.

The htDig, wget, squid approach is probably a bit overkill for my needs. I don't want to store everything I surf to. I just want the gems from the internet.

Gmail has the same problem as a Wiki. It's not designed for it and there must be something better.

a9.com does not have a search feature for their bookmarks. That doesn't make it more worthwhile than the bookmark feature in any browser.

Lookout may be a solution but again I'd rather have a tool that is developed to solve my problem.

PersonalBrain doesn't do what I want. Omea may do the job... I haven't found out yet what it exactly does. Works only with IE it seems :(

Thanks every one for your contributions. There is a lot of solutions to choose from.
Peter Monsson Send private email
Sunday, October 03, 2004
 
 
Here's one more: store a list of websites with interesting information.  When you want to get that information back, use the Google API to search for the keywords you want where site:<any of the sites in your list>.

Can someone who knows the Google API better than I do tell me if this is possible?  Can you specify a set of sites?  Would you have to write your own interface to iterate through the sites you want to search?
Eponymous
Sunday, October 03, 2004
 
 
What about a Blog?

Just copy the text you're interested in, and add it to your blog along with the URL. You don't have to share it, just upload it to your personal space, and you have a searchable record of every site that caught your eye. Movable Type will happily live on your server, and the price is right (free for personal use).
www.MarkTAW.com Send private email
Sunday, October 03, 2004
 
 
Dave Thomas, in one of his blog entries a little while ago (http://www.pragprog.com/pragdave/Random/NearTimeFlow.rdoc) wrote about a tool called Flow that sounds like what you're after.  Haven't used it myself, but you've reminded me to take another look.
Michael Jessopp Send private email
Sunday, October 03, 2004
 
 
"..google will know everything about you". Yes. I'm not sying this is good. Thats's also the biggest complaint against a9.com and what data libre is all about. But I do want it on a server. My server. I want only a very snmall amount of private data local.

And if you want your data local, Chandler looks promising but is still a long way out.
insider
Sunday, October 03, 2004
 
 
This would make a cool firefox extension.

Imagine right-clicking on a link and selecting 'add to diary'.  A sort of long term history only for links that you want to remember.
i like i
Monday, October 04, 2004
 
 
Geeze, Chandler. I almost forgot that, uh, didn't exist.

I still think a weblog could work here. Yeah, it's not automatic, but if it's worth keeping, it's worth spending a few moments on during entry. Just copy/paste what you like into the body, make sure you include a link, add any meta information you want, and you're good to go. Movable Type even lets you add categories, so you aren't stuck with just a chronological list.

I did try this once a while back, but never bothered to use it. It's just not THAT important that I can find something later.
www.MarkTAW.com Send private email
Monday, October 04, 2004
 
 
I think what the poster would like (and *I* would like) is to be able to say:

"Hmmm... I KNOW I saw something like that on the web."

Then search PAGES YOU"VE VISTED for "abc and def".


If you have to CONSCIOUSLY past every little thing into some wiki, or whatever, you'll never do it. Too much work. Also, you might only need .1% of what you see. So who wants to spend 10 sec/thing : if I look at 1000 things a month then I'm spending 10,000 seconds just to get to ONE THING.

So, you're spending 3 hours (10,000 seconds) for EACH thing you want to recall. That's inefficient.

What you want is something that searches the IE cache.

I think there's a good PRODUCT IDEA here.

I'd pay $30 for it.
Mr. Analogy {Shrinkwrap ISV owner} Send private email
Monday, October 04, 2004
 
 
BTW, if someone wants to create the above product, but doesn't know how to market it, I'm happy to lend my assistance.

I can either give you the benefit of my 10 years of marketing experience, or I can sell it for a %.
Mr. Analogy {Shrinkwrap ISV owner} Send private email
Monday, October 04, 2004
 
 
Ctrl - H & Search? (Firefox)
KayJay Send private email
Monday, October 04, 2004
 
 
If I've understood ... you are on a page with interesting material and you want to store it.

Personally I find that bookmarks are not enough.

You could use a favelet. The favelet links to a simple app on your website (certainly everyone here has their own webserver running on the net ... I use noip.com to use my cable access to host my personal web server) which will retrieve the page you are interested in, parse it and store it in a database. You might want it to maintain a window so you can add a note or format what it parsed. You then have another simple app that will search,sort all the sites you have saved this way. In fact this is so simple I'll probably do it tonight as I'm back in the writing examples mode.
Peter
Monday, October 04, 2004
 
 
Wouldn't this just be as simple as having something that keeps track of the sites that you visit (just the sites, not necessarily the individual pages) and then using Google to search those sites when you want to find something?  I think using Google to search would be pretty easy, the only slightly difficult thing would be to perserve the sites that you have visited.  There must be a way to add something to IE or Firefox or whatever to save the history somewhere when you close your browser.  I am not much of an app guy but this doesn't seem too difficult to me.
0xCC Send private email
Monday, October 04, 2004
 
 
I had a program that searched through all the sites I visited. I could set the database to purge every x days, I think I had mine set at 2 weeks. It was extremely slow and compacting the database took forever.

CTRL + H only searches the titles, unfortunately. In IE you could do a full text search.

Blogger has a bookmarklet that lets you highlight some text, click it, and it will create the entry for you, prefilling the information about the URL, page title, and pasting in anything you highlighted.

At this point, storing information is almost as easy as bookmarking. Highlight, click the bookmarklet, click submit. Keep browsing. I don't know if Movable Type has the same kind of thing.

Like I said, if you couldn't do full text searches on your Cache in IE, I'd say this space is ripe for someone to build a good, clean software package for.

I think I mentioned earlier that a search engine could somehow build a browser plugin that let you search through your history using the power of the search engine. The downside to this (privacy issues aside), would be that you could only search through pages it's already indexed, and not pages that are brand new. Then again, pages that are brand new, you're more likely to remember where you saw, because you had to have visited them more recently.
www.MarkTAW.com Send private email
Monday, October 04, 2004
 
 
I've actually been wondering whether I should make a web-proxy with enough hard-drive that it needed not be limited, then add versioning (so nothing ever get's removed), full text search (so you can search what you've already seen) and ability to use "time-machine", that is, get the latest version not newer than some date from the proxy.

Sure, it would take some harddrive space, but not too much I think. Only saving the text might be a good idea though.
Mystran
Wednesday, October 06, 2004
 
 
Peter

Have a look at this product review from PC Magazine
http://www.pcmag.com/article2/0,1759,1650397,00.asp?p=2
Stephen Jones Send private email
Thursday, October 07, 2004
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz