The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Standard regexp for linux and windows filenames.

Is there a "standard" regular expression used to match correctly-formed linux and/or windows filenames?
Thursday, June 21, 2007
You can do anything in linux, yes even spaces, yes even forward slashes, so anything goes.  Windows only has a few characters you can't use [\/:*?"<>|].  They're both unicode enabled.
Grant Send private email
Thursday, June 21, 2007
"Windows only has a few characters you can't use [\/:*?"<>|].  They're both unicode enabled."

Having been recently bitten on this, I know that the list is longer.  For one, "^" is an escape character.


Gene Wirchenko
Gene Wirchenko Send private email
Thursday, June 21, 2007
Nope. ;-)
Or the more accurate answer is 'it depends'...

You can changed the name of a file to "foo^" in explorer just fine.  It'll flake out if you try to use that file from the command line, unless you enclose it in quotes.  But that's the same as a space in the filename.
anon for this one
Thursday, June 21, 2007
I've never seen a unix system, and certainly not Linux, that allows slashes in filenames. Other than the slash, however, anything goes, so a regex for a valid Linux filename would be simply [^/]+

Windows has more restrictions on what can go in a filename, but it's been a long time since I was familiar with such things.

If you want to ensure that a filename would be valid in BOTH Windows and Linux, then you can simply devise a reasonable subset of both naming requirements:

  [A-Za-z0-9._- ]+

Any filename that matches that regex should be valid on Windows and Linux (but not the converse: not every filename that is valid on Windows or Linux will satisfy that regex).
Jeffrey Dutky Send private email
Thursday, June 21, 2007
I stand corrected, Jeffery's right.  I just thought you would be able to escape a forward slash just like you can with a bunch of other stuff like *'s and ?'s that you wouldn't use in a sane file name.
Grant Send private email
Thursday, June 21, 2007
Remember that Windows Explorer really doesn't like some names (e.g. CON).  The API supports creating such a file, though (I made one from the command line, but don't remember how).
Brian McKeever Send private email
Thursday, June 21, 2007
Don't know if it's germane to your application, but for Windows there's also the whole alternate data stream issue that may come up, with names like myfile.dat:stream1:$DATA...
Rob Send private email
Friday, June 22, 2007
Be careful: it's possible (and in some circles common) to mount DOS or NTFS filesystems under Linux (especially when cheap NAS boxes are involved). Therefore, it's not possible to say that a filename is ok under 'Linux' - it depends on the filesystem in question. On a system that you know is on ext3, then eliminating the directory seperator will be sufficient, but for a general use program, that will not be enough.

What are you trying to do with this regex? If you're just trying to validate filenames before trying to write them, may I suggest that you not bother? Just write your program to come back and ask for another filename if you can't write to the one the user gave you. That way, the filesystem driver will do the checking for you.

This will work better because you don't have enough information to properly check the filename (since you likely don't know what the filesystem is). By letting the filesystem tell you if there is a problem, you eliminate the possibility of being over-restrictive, and you don't lose anything because you were going to have to deal with the error case anyway.

Good luck!
Michael Kohne Send private email
Thursday, June 28, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz