The Joel on Software Discussion Group (CLOSED)A place to discuss Joel on Software. Now closed. |
||
|
This community works best when people use their real names. Please
register for a free account.
Other Groups: Joel on Software Business of Software Design of Software (CLOSED) .NET Questions (CLOSED) TechInterview.org CityDesk FogBugz Fog Creek Copilot The Old Forum Your hosts: Albert D. Kallal Li-Fan Chen Stephen Jones |
I'm trying to write a program in Perl to analyze a flat file database. This flat file database could be quite large (over a couple gigs), and our systems don't have the memory to read that into memory completely.
Luckily, I can linearly traverse this file, one line at a time. So, really, I wouldn't need to load anything into memory except for one line. Buuuuttt, I can't seem to figure out how to do this. Does anyone with any Perl experience have a link or information on a module or something? I've tried searching, but most Perl file stuff is either reading it all into an array (not possible), or using DB_File.... Which I'm trying to use on a test database of size 100mb right now, and it has yet to start processing-- 25 minutes into running the program. :-/ Any ideas/suggestions? Note: I know very little about Perl's file capabilities. Even a link with a little explanation would be great. Thanks.
Anon Monday, July 11, 2005
So I guess I didn't catch this until just now. Am I right about the following:
If you just iterate over the file handle, it only loads one line? That is, if you set the file handle to a scalar, it only returns a row, like so: $line = <FILE>; Whereas, if you set the file handle to an array, it loads the entire file, like so: @lines = <FILE>; Is that correct? Thanks.
Anon Monday, July 11, 2005
Well, considering I just got it working, reallllly fast, I'd say, yeah, that was about right.
Aah, I love having the "aha" moments. :) Thanks for your help.
Anon Monday, July 11, 2005
Better Yet:
while(<>) { #Do something with $_ print $_; } This whill shift off the file name you pass in from the command line into a file and read from it, a line at a time, populating $_ (the default variable) with each pass. If you don't want the carraige return, don't forget to chomp()! Hey, on a related note, check out my 1 hour introduction to perl for programmers: http://www.xndev.com/Speaking/PerlIntro01.ppt Regards,
Ugh. That's not "better yet". That's "worse yet".
As much as I like banging out quick scripts in perl, I *hate* perl's implicit variables.
I have to agree with "worse than". After using perl for 8 years, I still think that the implied variables are a bad idea.
nobody Tuesday, July 12, 2005
Here's one that's slightly different than (and, in my opinion, superior to) the example above:
foreach my $line (<FH>) { # Do something with $line } Same basic idea, but I just like this idiom better. Of course the "while" example above is slightly more efficient (not that you'll notice).
Aaargh! No!
The "foreach" example is bad. Don't do that unless you know what you are doing. (Remember that half the people posting here don't know what they are doing--are you one of them, dear reader?) In the OP, the file was described as about 2 GB in size. The foreach construct will try to read the whole file into memory before processing it. Do you have a spare 2 GB of RAM available for that? Use the while construct. It's the Right Way To Do It.
Actually the more correct while loop is
while ( defined( $input = <FH> ) ) { # do something } without the "defined" function a line containing a single zero or a blank line will evaluate to 'false' and cause the other while loop to terminate. See page 18 of "Effective Perl Programming" by Joseph N. Hall and Randal L. Schwartz for the topic "Item 5: Remember that 0 and "" are false.". I recommend that book.
empty Wednesday, July 13, 2005
I'm sorry, but you are incorrect about the while loop, though you are correct that 0 and "" are false.
The loop: while ( defined( $input = <FH>) ) { # do something } is identical to the loop: while ( $input = <FH> ) { # do something } in every respect. You can refer to the perlop manpage for confirmation, and then try it for yourself if you don't believe it. (It's not magic; there is a simple and obvious reason why.) It seems unlikely the book would be in error with Randal Schwartz being one of the authors; perhaps you have misunderstood what it said? |
|
Powered by FogBugz


