The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Looking for extremely quick directory indexing

Part of the new app I'm designing needs to effectively run a slideshow of images.  Due to the nature of my client's business (no, it's not what it sounds like.... :) they have thousands (30K+) of images.  All images are under one top level directory, but then under that could be buried a random number of directories below.  Basically, files everywhere!

I want to very quickly index the all image files (so say *.jpg) when the app starts so that the slideshow can begin immediately.

Looking around the marketplace I've found apps such as StickyView and the slideshow plugin to Infranview which take ages on startup (3+ mins, which is an age when you're sitting there waiting for something and certainly not acceptable to the cilent) due to this indexing.

File system is NTFS on Windows XP.

Is there a way (perhaps by examining some low level part of the file system...) to get an index of files really quickly?

My current thought is to have a Windows service as part of my app that performs indexing on system startup (in the traditional scan all the folders approach) and then periodically checks for new file quietly in the background.  I then have my slideshow component access the data this service maintains.  I can see this working but I wanted something more elegant.

My goal would be about 30secs max startup time for the slideshow component.

Thanks,

Kevin.
Kevin Walshen Send private email
Monday, November 17, 2008
 
 
scan for files and after you have found the first 10 images, start the slideshow and continue the scan in the background
bumperbox
Monday, November 17, 2008
 
 
I guess I don't understand what problem you are trying to solve. Indexing always implies that you are collecting the names for use by some other criteria. Just getting a list of files that are present does not require indexing. Finding a list of all files under a directory tree that starts with the word "Vacation" does. So you aren't telling us what type of indexing you are really trying to do.

But anyway, obviously you are going to need to do a first-time indexing run and that is going to take some time. After that, you will probably want to look at the difference between iterating through all of the files manually and setting up a file system watcher event on each directory. Google for "file system watcher" for more details. Which one you pick will be dependant upon many different factors that are specific to your environment.

Finally, don't rule out using the existing Windows Indexing Service API if the user has this service running. I've never used it but supposedly you can call into it for various pieces of information. Unfortunately, many users of XP turn it off.
uggh
Monday, November 17, 2008
 
 
Bumperbox++.

The real solution to this problem (as you have described it) is to create a thread that searches for matching files that is separate from the GUI thread that displays the slide show.

The thread that builds the list of files (and I agree with a poster that said that this is not indexing, it's just a search) would work in the background as the images are displayed. Think of the "found" files as a queue of data. The background search thread fills the queue, the slide show consumes the queue.

Otherwise, if you do everything in the main thread of the program, you have to find all of the files almost instantly, which is a no-go. But you *can* find just one file to start the show, and rely on the fact that viewing will be orders of magnitude slower than the process of searching for files.

If the user can skip to the middle or the end of the file list, then special tricks/sleight of hand will be required.
Bored Bystander Send private email
Tuesday, November 18, 2008
 
 
Have you tried a naive implementation?  I wrote something similar for work the other day that could handle 60K+ files in under a second.
Brian
Tuesday, November 18, 2008
 
 
A network share could really slow down any scan. I think putting the file scan in its own thread is the best bet.
Bored Bystander Send private email
Tuesday, November 18, 2008
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz