The Joel on Software Discussion Group (CLOSED)

A place to discuss Joel on Software. Now closed.

This community works best when people use their real names. Please register for a free account.

Other Groups:
Joel on Software
Business of Software
Design of Software (CLOSED)
.NET Questions (CLOSED)
Fog Creek Copilot

The Old Forum

Your hosts:
Albert D. Kallal
Li-Fan Chen
Stephen Jones

version control for really really large files?

hi everybody

i know about cvs, subversion and all the other systems though i never really used one for more than a few minutes. now i want to set up my own.

the thing is that i am doing mostly graphics design and videos and that my files are usually several megabytes to several hundreds of megabytes. some are even several gigabytes. can these version control systems work with such large files? are they even of use for mostly binary files?

and do you know a convenient way to do backups? right now i copy the files to a second hard disc because of the size of them (my work directory right now is about 40 gb).

oh, and i work on mac os x but unix tools run here as well.
Saturday, January 08, 2005
You need to check the docs of your prospective version control system and see how it handles binary files. Some, like CVS, do not handle them efficiently.

One of the benefits of a version control system is to be able to see the differences between versions and to do selective merging on occasion. With large image or media files, where visual differencing and merging are not simple operations, it may be more practical to just make dated backup copies as needed.
Ian Boys Send private email
Saturday, January 08, 2005
You definitely need Perforce.  Not only is it a fantastic version control system, its performance is unmatched.

I work for a game company and we store huge binaries in P4, i.e. thousands of raw .WAV, built binaries, you name it.

Our depot is about a terabyte now.  The largest file we store is probably about 500 MB.  Enough?  : )

Of course, you will probably need to investigate hardware as well if you want this kind of performance.
Roose Send private email
Saturday, January 08, 2005
Subversion and some other source control systems don't store an entire file as the new version when you make a change.  Instead, they do a "binary diff" and store only the part(s) that changed.  Later, if necessary an entire file of any version can be reconstructed from the original file by applying the relevant binary diffs.  This sort of versioning system is more efficient at versioning large files, because the size of the file isn't that important.  The size of the change is the major factor in determining efficiency.

Having said that, I haven't had enough experience with Subversion to tell you that it works great with large binary files.  I do know that that's one of the things Subversion was designed to do, so I assume it's pretty good at it.
Herbert Sitz Send private email
Sunday, January 09, 2005
If you use adobe software, I think there's a versioning system included in the latest version of Photoshop CS (not sure about the others), though I've never used it. Might be worth a shot?
Sunday, January 09, 2005
Happened to stumble on this article:

Keeping Your Life in Subversion

Scroll down for the comments, the first one says:

"Currently, Subversion cannot handle files with size of the order of 100 MB in a realistic time interval (files of this size are seen in image processing projects). Checking out a few large files locally takes about 10 minutes but over HTTP from the Subversion server to a client machine takes hours."
Alexandru Pojoga Send private email
Sunday, January 09, 2005
Sorry, should have tinyurl'd it.
Alexandru Pojoga Send private email
Sunday, January 09, 2005
>  You definitely need Perforce.

Agreed. We would have very large hardware design files, in excess of 300MB, and they would work fine in perforce. It's never going to be fast mind you, that's a lot of data, but it would work.
son of parnas
Sunday, January 09, 2005
AFAIK, Perforce doesn't deltify or compress binary files at all.  So let's say you have a 500MB file.  Check it out, make some changes and check it back in.  Now your repository has increased in size by 500MB. 

This may or may not be important to you.  After all, I've heard lots of people (including two here) say that Perforce works well with large binary files.  Disk space is cheap.  I make no secret of the fact that I respect and admire Perforce as a worthy competitor.

Vault takes a different approach.  Everything in the repository is stored in a deltified format.  New versions consume space roughly on the order of the number of bytes changed.  This costs us a bit of performance, but it's far more efficient in the use of disk space.  It's a tradeoff.  Subversion uses a similar design I believe.
Eric Sink Send private email
Sunday, January 09, 2005
From what I hear, compression tends to consume a lot of CPU on the server.  So if you have only a few people using it, that would be fine.  But if a lot of people are syncing and checking in 500MB files, it could be an issue.

Yeah it is a tradeoff, it would be nice to have the option.  Though in practice it hasn't been a problem.  We have a team of 70 or 80, checking in many versions of large binary files, and we consume about a terabyte a year.  Which is definitely within reason.
Roose Send private email
Sunday, January 09, 2005
I looked at that subversion comment about it taking several hours to sync 100MB files.

Incidentally, it has been my experience that Perforce is FASTER than just copying the files from the same server.  They must use a very well tuned network protocol.

Or maybe it's because most syncs have some text files too which are deltified... but still it is super fast.
Roose Send private email
Sunday, January 09, 2005
Perforce make a big deal about their speed. But other than a minor irritation I've never really been concerned as to whether it takes 30 seconds or 3 minutes to get a baseline of files. I would imagine very few people actually NEED the performance. A nice to have maybe.

Incidentally standard Windows file movement seems to perform awfully. Transfer 1000 files of 1MB each and it takes far longer than 1000MB/network speed. And what's REALLY annoying is when you reorganise files on a remote machine. Windows insists on copying them to your machine in the middle. Or at least it always used to anyway. Dumb!
Gwyn Send private email
Sunday, January 09, 2005
Another reason Subversion is not the solution for large binaries is the fact that working folders contain a hidden "pristine" copy of the files (i.e. in the version you took from the server before you started modifying them). So if you check out a folder of 2 GB from the repository, it takes up 4 GB on your disk.
Ivan-Assen Ivanov Send private email
Sunday, January 09, 2005
"Another reason Subversion is not the solution for large binaries is the fact that working folders contain a hidden "pristine" copy of the files (i.e. in the version you took from the server before you started modifying them)."

I don't know whether that's a reason for or against using Subversion on large files. 

Yes, maintaining the "pristine copy" on the dev machine  results in the files taking up a bit more space on the development machines.  But the main worry with disk space is probably at the repository server, and Subversion's use of diff (or "delta") versioning means that the disk spaced used on the server can be a small fraction of what it would be with a system that stores entire files.

Plus, having the pristine version that you're working with on the development machine means that to check in a file using Subversion your machine can create the diff on its own and send only that potentially tiny file to the server.  As long as this works reliably (and apparently it does), seems like this reduced network traffic could be a big benefit.
Herbert Sitz Send private email
Sunday, January 09, 2005
Yes, the pristine local file serves two purposes.

First, as you mentioned, is that you get deltas going both ways.

Second, you can diff against (and revert to) the original without needing to be connected to the network.

Both of these are pretty sizeable wins for usability and network performance, at the expense of local disk space (which in 99% of the cases for source code is insignificant).
Brad Wilson Send private email
Sunday, January 09, 2005
thanks everybody!

i narrow it down to three candidates right now: subversion, perforce and vault. vault looks very interesting but from what i see it runs only on windows?

we can use either unix, linux or mac os x for the server but our clients are all mac os x. do you have any experience with these systems in a non-windows environment?
Sunday, January 09, 2005
The Vault GUI client runs only on Windows.  We have a command-line client which runs on Linux using Mono as a substitute for the .NET runtime.  This solution doesn't work on MacOS yet due to a few endian issues.

Perforce is very cross-platform.  They have a GUI client built using Qt.

Subversion can be cross-platform as well, although I don't know the current state of any GUI clients for it.

It sounds like Perforce will be a good fit for you if you have the budget for it.
Eric Sink Send private email
Monday, January 10, 2005
You could try alien brain:

It's designed for large binary files, and sure enough seems to work well for that (I use it at work), so it may be worth a look if nothing else seems to fit the bill. I get the impression it's rather Windows-oriented, but their website implies you can make it work in a more Unix environment.
Monday, January 10, 2005
My personal experience is that AlienBrain was a pain in the a**.  We switched from AlienBrain to Perforce specifically for performance reasons.

But then again that could have been the way the server was set up, I didn't deal with that.

Also in the version I used, it would never delete files, which was a pain in the ass.  So if someone submits a delete, it remains on everyone's computer, which IMO is terrible.  The only difference is that new people don't get it.
Roose Send private email
Monday, January 10, 2005
eric: if vault would work for us i would buy it right now just because you recommended a competitor's product. :-) this kind of honesty is rare, i feel.
Monday, January 10, 2005

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz