The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

CHAR vs. VARCHAR

Is there any general rule about using CHAR fields over VARCHAR and vice versa? I assume that at some string length, it will be better to use VARCHAR, resulting in smaller DB. But I also think that VARCHAR is using more CPU time when reading/saving and there is also probably some space/time overhead needed to support the variable length? Is there any rule, so you could say for instance Use CHAR up to 5 characters long strings and otherwise use VARCHARS?

I am currently using local Access database if it's important.
Petr Veit Send private email
Friday, August 03, 2007
 
 
It really depends upon the data that you intend to store.
Entries of Confusion Send private email
Friday, August 03, 2007
 
 
1. I find the semantics of CHAR to be something of a pain; if it's a Y/N field, sure, no problem, but otherwise I use VARCHAR.

2. Micro-optimizing for storage and CPU at that level seems to me a waste of time.
George Jansen Send private email
Friday, August 03, 2007
 
 
This is more of a question of what % of the field will be used, if you have a field that is always 200 chars long and all 200 are always used, that a Char! Length is NOT a good indicator its just one variable.
To lazy to log on
Friday, August 03, 2007
 
 
CHAR is used to store fixed length characters. If the defined length is 10 it will always store ten characters even if you only store "A". the rest will just be filled with blanks.

VARCHAR only stores the number of characters you enter, but also has a small overhead as it also stores the number of characters.

So in general, use CHAR when the data entries in a column are expected to be the same size.
Use VARCHAR when the data entries in a column are expected to vary considerably in size. Considerable in size is also related to the overall size. Between 1 and 10, use CHAR, between 1500 and 1800, use VARCHAR, even that the difference in % is less in the latter case.

Note that CHAR is faster than VARCHAR. Sometimes up to 50%.

The specific difference in storage and performance characteristics, depends on the database you use. F.ex. Oracle has a VARCHAR2. The difference is that even holding an empty value VARCHAR uses space (to record that the length is 0), VARCHAR2 holding an empty value does not use any space.
ThMoJe
Friday, August 03, 2007
 
 
>> Is there any rule, so you could say for instance Use CHAR up to 5 characters long strings and otherwise use VARCHARS? <<

I've used that convention in the past and it worked well for us.

If you're really concerned over disk space usage, you can compare how many bytes needed to store each column type (including storage overhead), plus the expected content length in your data, against your DB's page size.  But unless your tables have millions of rows, the effort would be wasted, so don't bother.

A more important question might be NCHAR vs CHAR.  Do you ever expect to store Unicode data?  Ever?
xampl Send private email
Friday, August 03, 2007
 
 
I have never used CHAR in my life except for single character columns that have to have a value. VARCHAR should be used instead.
anon
Friday, August 03, 2007
 
 
>>I have never used CHAR in my life except for single character columns that have to have a value. VARCHAR should be used instead.

When you are unable to tell sarcastic Post from legitimate ones you know this board is going down hill.
To lazy to log on
Friday, August 03, 2007
 
 
My general rule has always been...

If it is fixed length, use CHAR.

If the length is almost static (like say between 6 and 8 chars ALWAYS), then use CHAR.

Otherwise, use VARCHAR.

It ends up being 99% VARCHAR.
Eric D. Burdo Send private email
Friday, August 03, 2007
 
 
Thank you for all your replies guys, very helpful!

My main text fields will be for notes about each record, so in 99% the field will be blank, but for some cases I want to let the user add some notes, say 20-30 characters. The VARCHAR type seems to be an ideal case.

And yes, I want to store Unicode, since there will be Russian and Japanese versions. I could make some conversions from Unicode to local ANSI set in Windows, but native Unicode support was the main reason why I am just leaving C++ Builder and switching to .NET / C#. I want to enjoy Unicode even in the databases (so far local Access with thousands of records).

Thank you again,
Petr
Petr Veit Send private email
Friday, August 03, 2007
 
 
Unicode?  Then be sure to use NVarChar (assuming your using SQL Server).
Eric D. Burdo Send private email
Friday, August 03, 2007
 
 
It's interesting that both CHAR and VARCHAR fields support Unicode in a local Access database. Should I change it to NCHAR / NVARCHAR anyway?
Petr Veit Send private email
Friday, August 03, 2007
 
 
xampl: ">> Is there any rule, so you could say for instance Use CHAR up to 5 characters long strings and otherwise use VARCHARS? <<

I've used that convention in the past and it worked well for us."

Unless there is a meaning for the empty spaces that follow whatever you enter that's less than 5 chars, this never works well, AFAIAC.

Using Char(5) when you know that content isn't always 5 characters long creates a pain in the butt condition right off the bat. Simply using the char column in a SELECT is a pain: SELECT charfield WHERE charfield = 'ABCD' fails, because there is no row = 'ABCD', but there is one equal to 'ABCD '. So you have to either 1) add one or more trailing spaces to your WHERE clause when the value you want to find isn't exactly 5 chars (usually either in code or using a function in the SQL statement),  2) trim the trailing space from the row content before comparing (another function call in the SQL), or 3) add a wildcard to the end of the value you're looking for so that the trailing spaces are ignored. #3, however, won't work at all, because that would make 'ABCD%' match 'ABCDE', which is of course not what you want at all.

Regardless of the logic, using a CHAR column when 100% of the data isn't going to match in size requires additional thought about all of the issues that arise, and additional work to eliminate or at least deal with those issues.

In other words, VARCHAR (VARiable-length CHARacter) is appropriate when the contents of the column VARY in length, CHAR (non-variable length CHARacter) when they don't. This should be considered a rule, and you should have a very good, very well-thought-out reason for breaking it.

Similar reasoning is involved in the decision to not store dates or numeric values in CHAR/VARCHAR format, but use the appropriate DATETIME or NUMBER or MONEY format instead. Less work to use, less issues with validation, less trouble with SELECT/UPDATE/INSERT, easier CONSTRAINTS, etc.
Ken White Send private email
Friday, August 03, 2007
 
 
Ken: thank you for clarification, makes a lot of sense! I am going to use NVARCHAR in most cases then.

Any comments about the VARCHAR / NVARCHAR usage in local Access databases? I've tried both, and both types works the same (supporting Unicode). I assume I will rather use the NVARCHAR type, so any move to other database will be smoother.
Petr Veit Send private email
Friday, August 03, 2007
 
 
Ken -
I was going to mention the padding issue.  Thanks for bringing it up.
xampl Send private email
Friday, August 03, 2007
 
 
I'm sure that some database implementations store varchar(200) as 200 chars + length , and trim to the length after retrieving (it's easier to fetch rows of a fixed size)....

always use varchar unless it's a fixed length string like Y/N or T/F char(1) or an SSN or ID number that is of a fixed size ...etc.

i.e. never store a 9 character string in a char(10), you get an extra space... only store exactly 10 chars in it.
Totally Agreeing
Friday, August 03, 2007
 
 
As as I'm concerned, VARCHAR/NVARCHAR should always be used unless there's a really good reason for using CHAR/NCHAR (there rarely is).  People think they're clever using CHAR for "performance" or whatever but it usually just brings along a bunch of complications without any real gains.  I get a good laugh whenever I see things like "zipcode char(5)".
SomeBody Send private email
Friday, August 03, 2007
 
 
SomeBody : but using varchar(1) for a Y/N column seems dumb ... I just don't like the way it looks. :)
Totally Agreeing
Friday, August 03, 2007
 
 
shouldn't a y/n field be a bit?
Ruatara P Send private email
Sunday, August 05, 2007
 
 
Yes it should.
But I think they were meaning a single-character 'code' column, where P means 'paid' O means 'open' 9 means '90 days overdue', etc.  Something which probably could be foreign-keyed out, but wasn't for some reason (reduce the number of joins, reporting purposes, intentionally denormalized, etc)

The point was the contents were always going to be one character -- no more, no less, and therefore using a NVARCHAR column would be overkill.
xampl Send private email
Sunday, August 05, 2007
 
 
"shouldn't a y/n field be a bit? "

Not all databases support the bit column type.
anon
Monday, August 06, 2007
 
 
xampl: "But I think they were meaning a single-character 'code' column, where P means 'paid' O means 'open' 9 means '90 days overdue', etc.  Something which probably could be foreign-keyed out, but wasn't for some reason (reduce the number of joins, reporting purposes, intentionally denormalized, etc)

The point was the contents were always going to be one character -- no more, no less, and therefore using a NVARCHAR column would be overkill."

Exactly. A CHAR(1) is perfect for things like [O]pen, [P]osted, etc. Using an NVARCHAR() would be idiotic. The only possible reason would be to allow column to be empty; allowing NULL or using a blank space would solve that issue.
Ken White Send private email
Monday, August 06, 2007
 
 
Petr: "Any comments about the VARCHAR / NVARCHAR usage in local Access databases?"

No real experience with Access; I went from the old Clipper/FoxPro/dBASE file format directly to Interbase/SQL Server/Oracle, and have only used Access databases as an export medium when my client insisted, and therefore used ADO and just did SELECTs.

It makes sense to follow the same rules and recommendations, though. As you mentioned, it would make future upscaling much easier IMO.
Ken White Send private email
Monday, August 06, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz