The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Execution time of program.

I'm running a somewhat complex utility that processes a text file line by line. Exactly what it does isn't important, but it basically reads 20 or 30 lines, performs some midly complex processing, then writes the result to a log file.

I would expect processing time to increase linearly with the size of a file; a file twice as big should take twice as long to process. However the time increase is geometric or maybe exponential, I haven't measured it that closely. But I know the 6 and 12 meg files are reasonably fast but the 40 meg file takes *forever*. The program's memory use is fairly constant, so it's not leaking memory and burning time disk swapping or something.

Any theories?

    Flava Flav!
Flava Flav
Friday, November 05, 2004
 
 
> I would expect processing time to increase linearly

Seriously, why?
Perhaps your expectation is flawed?

Check your expectation first (a profiler would help), else you will be hunting for micro-otimization instead of attacking the actual problem.
Whatever
Friday, November 05, 2004
 
 
Read up on complexity theory.  Processing time will only increase linearly with the size of input if you're using an O(n) algorithm.  It sounds like you're not.
Iago
Friday, November 05, 2004
 
 
Unix or Windows?

I theorize that 40 Meg crosses some O.S. threshold, so the entire file can't be read into a buffer by the O.S. and then given to your program on demand. 

If you are doing a few lines at a time, it doesn't sound like your *program* is allocating the buffer, so you are using the O.S.'s buffer.

Could also be a memory leak, which triggers when you read that many more lines.
AllanL5
Friday, November 05, 2004
 
 
Don't really think there is common cause of this... I guess you need to detect it by profiler.
Carfield Yim Send private email
Friday, November 05, 2004
 
 
I expected the times to be linear because I'm reading a text file sequentially, processing blocks of text, and writing the results sequentially. The blocks are not realated to one another and are thoroughly disposed of after being processed.

A six meg file takes about twice as long as a 3 meg file. The 12 meg file takes a little more than twice as long as the 6 meg file. The 40 meg file takes at least 10 times as long as the 12 meg file.

  Flava Flav!
Flava Flav
Friday, November 05, 2004
 
 
And which is the memory requirements? It may use O(k*N) memory and, as AllanL5 pointed, the actual memory required for the 40MB input may be getting larger than the physical avaiable memory (thus requiring swapping, thus make it a lot slower).
Whatever
Friday, November 05, 2004
 
 
Does anybody here actually read the original posts?

He's reading in 20/30 lines at a time then processing them. All this talk about ON^2 processing times and memory paging is therefore junk.

Without knowing the language/platform he's using there's really no way of telling what the problem is. It could be a garbage collector kicking in, it could be the "writing to the log file" hitting the OS write cache limit or any number of other things. There simply isn't enough information to answer the question.
Joe Cuervo
Friday, November 05, 2004
 
 
Heh. The language is c++, the OS is Windows (xp) ((pro)).

If there were enough information to solve the problem, I would have solved it already.

At the moment it's really a moot point, as there was only one 40 meg file. It was pretty odd behaviour though, which is why I thought I'd see what all the smart people who get things done thought about it.

    Flava Flav!
Flava Flav
Saturday, November 06, 2004
 
 
If there is only one 40 MB file, try it with another. Maybe the file is hardly fragmented or somehow damaged, so actually reading the file takes that long, independend of your application?
Gerd Riesselmann
Saturday, November 06, 2004
 
 
Since the blocks aren't related, what happens if you split that 40MB file up into several smaller files? Is the execution time the same?
sloop
Saturday, November 06, 2004
 
 
I cut the file in half and the 20 meg verison was about the same speed as the 40.

Hmm, while I was playing around with this some more Windows popped up a little dialog that said I was low on virtual memory and that it was increasing the size of my page file.

    Flava Flav!
Flava Flav
Sunday, November 07, 2004
 
 
How do you get to this 20/30 lines? There is no "seek to line 33455 call I know of... And if you just skip first 33454 lines :)
WildTiger Send private email
Tuesday, November 09, 2004
 
 
As noted above, the file is read and processed sequentially.
Flava Flav!
Tuesday, November 09, 2004
 
 
At the risk of asking the obvious, there's no chance you have a memory leak is there?
David Clayworth
Wednesday, November 10, 2004
 
 
This may sound obvious but have you tried timing the processing using a clock? What values did you find?

Subjective evaluation of time can be very unreliable.

-Andrew
Andrew Punch Send private email
Wednesday, December 01, 2004
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz