The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Multi-Threaded Disk Access

I'm writing a silly little Java utility for my own use that trawls the files on my partitions and sums the file sizes to tell me exactly how much data is on my hard disk. I was considering spawning a new thread for each partition, but then I decided against it, because I'm guessing that it will increase the disk head movement if multiple threads in the program are accessing data in different partitions. Thus it will actually end up slower. Does this sound like a reasonable assumption?
John Topley Send private email
Friday, November 05, 2004
It may not matter if the method your program uses to measure file sizes causes the head to flip all over the drive by itself.  Multithreading may not be worse because of this, but I don't see how it can be better.  It's the same task with a shared physical bottleneck - the speed of the hard drive.  If you had access to a really low level api you could just skim the entire hard drive exactly once and collate the data in ram, but I strongly doubt you can get access to that api with java very easily.
Aaron F Stanton Send private email
Friday, November 05, 2004
My guess; it will be slower due to thread overhead unless you (A) are scanning multiple drives, or (B) whatever you do on top of disk IO takes a significant amount of time. I dont think head movement would matter much though.
NetFreak Send private email
Friday, November 05, 2004
Ah, I hadn't thought of multiple drives.  One thread per drive might be a decent speed increaser.
Aaron F Stanton Send private email
Friday, November 05, 2004
Single thread / hdd is faster.
Friday, November 05, 2004
For me, I would rather do it in native method in this case... how many OS you really need to support really?
Carfield Yim Send private email
Friday, November 05, 2004
Every file system since DOS 1.0 sorts pending disk requests to minimize head movement.

For example if your disk has 100 sectors and you request to read 2, 94, 36, and 52 simultaneously you'll see the head sweep across the platter once, picking up the sectors you need as it moves across.

I'm pretty sure they also sort based on the disk platter rotation so they'll pick things up in the order that they spin by under the head.
Joel Spolsky Send private email
Saturday, November 06, 2004
Thanks everyone. I've written it as single-threaded and it works fine.

Joel: Are you sure about that? I ask because I believe that what you're describing is called elevator seeking and I remember it being one of the touted "grown-up filesystem" features of NTFS when it first came out. I'd be surprised if DOS 1.0 had it.
John Topley Send private email
Saturday, November 06, 2004
I thought they never got round to adding it to NT and are now touting it for Longhorn.
Joe Cuervo
Sunday, November 07, 2004
No, elevator seeking has definitely been in NTFS for a number of years. I remember reading about it in "Inside NTFS" by Helen Custer.

Perhaps you're thinking of the Cairo object file system, that has morphed into WinFS.
John Topley Send private email
Monday, November 08, 2004
Every last one of these opinions about what will be faster are worth exactly what you paid for them:  nothing.

You won't really know which is faster until you've coded it both ways and measured it.  The stated problem is small enough that you ought to be able to do this (both ways) in an afternoon.

So do it, and report back, with numbers.
Jim Lyon Send private email
Tuesday, November 09, 2004
Directories don't really have anything to do with the colocation of files on the disk so it doesn't matter the order you do it in. The driver will do the best with whatever
requests it gets.

Multiple threads will be faster because you can do a lot of the prep work in the background while you are blocked on the disk.

But your are still blocked on the disk so you are screwed.
son of parnas
Wednesday, November 10, 2004
You shouldn't need to recusivly scan the entire logical volume (partition) and sum all file sizes to determine total volume usage by all files.  Especially if you want the process to be fast, like in O(1) time. All Windows File systems (Not sure about Linux) always know how much total, used and free space is available at any moment in time for all mountable block devices (Disks, Ram, ROM, etc...)

You can get this info in O(1) time in Win32 using the result from GetDiskFreeSpaceEX() API's and subject the result from the DeviceIoControl() IOCTL_DISK_GET_LENGTH_INFO control code retrieveing the length of the specified disk, volume, or partition.


(derived from MSFT WMI page)

You can also use the Windows Management Interface (WMI) objects/API from any language such as C/C++/Perl or Scripting language like VBScript or JScript etc. All Post Windows 95/NT4 os's come with the Windows Scripting engine installed.

Here is a WMI VB Script example that does what you need
REM -- start of file 'du.vbs' --
REM execute this in a CMD.EXE shell as follows
REM C:\>cscript du.vbs
strComputer = "."
Set objWMIService = GetObject("winmgmts:" _
    & "{impersonationLevel=impersonate}!\\" _
    & strComputer & "\root\cimv2")
Set colDisks = objWMIService.ExecQuery _
    ("Select * from Win32_LogicalDisk")
For Each objDisk in colDisks
    Wscript.Echo "DeviceID: " & objDisk.DeviceID     
    Wscript.Echo "Used Disk Space: " & objDisk.Size - objDisk.FreeSpace
    Wscript.Echo "Free Disk Space: " & objDisk.FreeSpace
REM -- end of file --

I tried searching the JDK API/Class library for the same thing in the JVM/Runtime, but I could not find anything.

I'm really surprised you can't do this with the JDK? or can you?

Is there a standard Java Class or API that provides this type of O(1) time operation on disk volumes from within the JVM?

But i'm sure doing it the hard (brute force) way in Java is certainly a good programming experience to learn the Java threads and disk/file IO stuff.

Heston T. Holtmann, B.Sc.Eng. Send private email
Friday, November 12, 2004
Thanks Heston. I'm aware that I can use the Windows API to get this information. I'm using Java because that's the language I program in professionally now and it's all good experience. There's nothing in the package that gives you this information for a volume.
John Topley Send private email
Saturday, November 13, 2004

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz