The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Displaying non-english character-based text without unicode?

If I want to display Greek or Russian, or other non-english character based text (I.e., no chinese, etc.) that has a limited alphabet, is it sufficient to just set the font script to the appropriate language for that control?

I've tried this with Greek and it seems to work OK.  But my tests are hardly conclusive.
Mr. Analogy {Shrinkwrap µISV} Send private email
Wednesday, October 19, 2005
It probably depends on what your purpose is.  Are you doing this for full internationalization/localization of a program, or what?
Kyralessa Send private email
Wednesday, October 19, 2005
Have you had a Greek-speaker read your font-changed text?  You may not be getting what you think.

Disclaimer: I have no recent experience with non-Windows systems, so I can't speak to *nix intelligently.

In Windows, the code page determines how your bytes are interpreted.  The default is CP 1252, which is the Latin-1 page & is OK for English & some European languages.  Greek is CP 1253, so without switching code pages you can't get a clean representation.  If the text doesn't use the "other" characters, you'll be OK, but can you guarantee that will be the case?

My company's product is running into two things at the moment: 1) we're selling in Europe and CP 1252 dosen't cover Greek or Cyrillic(!)and 2) some of our clients are feeding data from Unicode-using systems, either databases or thru XML components, and don't want to have to convert going in & out of our database.  Looks like Unicode or bust....
a former big-fiver Send private email
Wednesday, October 19, 2005
No, it generally does not work to change the font. You must have a special font that interprets the character codes in Western European Windows-1252 as if they were Greek Windows-1253. I have seen this trick used (just this week I was looking into a similar trick of using a special font to treat Windows-1252 as ibm850 to get the old DOS OEM block line characters). But it is the exception, not the rule. If the computer is not booted in the Windows-1253 code page (system locale, aka "Language for non-Unicode programs in Regional Settings") you don't expect to see Greek in non-Unicode programs. But who knows, you may have a full suite of fonts for your single-byte character set needs and can get away with this hack. It would be ideal though if you are able to use a Unicode control to view this text, and then as long as you convert everything to Unicode properly you'll be golden. If you need tips on how to do that let me know.
Ben Bryant
Wednesday, October 19, 2005
Linux uses Locales, which seem quite similar to windows from what I gather.

The default locale is usualyl "C", whatever that means.

One key difference between linux and windows is that a wchar is 32 bits, while windows is 16 bits. (I have yet to check this)

It is my understanding that a correct use of unicode is to use unicode internally, and convert from the encoding the system is using to unicode, and vice versa, as appropriate.
Arafangion Send private email
Thursday, October 20, 2005
Can you let you know more about your application? What operating system it is on, what the control you’re specifying is and so on. Assuming it's not a web-based application, will it have string manipulation too which might be affected because of a different character set it comes across than just the text on controls appearing incorrectly?
Senthilnathan N.S. Send private email
Thursday, October 20, 2005
The font will change the glyph that users see but it won't change the byte value(s). 

The thing that tripped us up on this was a client that was loading text containing decimal 247 in one byte in a text file.  For them, it was a lower-case 'u' with umlaut; our system, using CP 1252, loaded it as mathematic 'divide-by' symbol because that's what d247 is in CP 1252.

When your code page/locale contains a byte value that matches but is different from what the user's CP/locale is, you'll have problems.  And a font does nothing to change spelling, sentence structure etc.

Some folks will be tempted to go with UTF-8, since it works for the Web.  Tricky points are that while UTF-8 is designed to be largely compatible with CP 1252/ISO 8859-1 (both Latin alphabet representations), its one-byte values are not entirely the same and a UTF-8 stream *can* contain multi-byte characters, which current versions of Windows will by default attempt to handle as multiple single-byte characters.  (don't know what *nix will do.  i'm playing with it but not at this level)

If this is for updating UI presentation for clients in different countries, you may need to look at either Java-type resource files or a runtime text modifier.
a former big-fiver Send private email
Thursday, October 20, 2005
I'm talking about a Win32 Delphi app on Microsoft Windows 2K or XP.
Mr. Analogy {Shrinkwrap µISV} Send private email
Thursday, October 20, 2005
WizTom supports Delphi apps.  Just verified we're looking at it for our PowerBuilder app.  Sorry, no experience to pass on as of yet.
a former big-fiver Send private email
Thursday, October 20, 2005
A choice of font will solve an artistic issue but the engineering issues will not get solved. There might be issues like the one 'a former big-fiver' relates to above.

One solution might be to find the code page of the system the application is installed and then use an appropriate font depending on what the code page is. I am not sure how easy this is or if it is possible in Delphi. There's a discussion in the following link on detecting the code page in Delphi. Please check if it will be of any help.

Another way would be to specify the font as 'MS Shell Dlg 2'. It isn't actually a font face like the usual fonts we come across. But rather it is a mapping mechanism to have characters that are not in the code page 1252. It would have artistic issues, like the text not looking good, but has more chances of getting the content correctly. The following link on internationalization in Delphi talks about this.
Senthilnathan N.S. Send private email
Saturday, October 22, 2005
Don't know what controls you're using.

If DevExpress controls are a possibility you might check with them by sending question to to find out how they support foreign languages.  I know they have users from Asia and Eastern European countries who face this problem (e.g., Greece).  My understanding is that DevEx controls do support Multi Byte Character Sets so you can use fonts from a foreign language but you can't combine different languages in same program and this setup also depends on getting things in synch with some locale code page in Windows.

Otherwise you can code with limited set of controls available in Delphi that support unicode, the TNTControls, VirtualDBGrid, etc.
Herbert Sitz Send private email
Monday, October 24, 2005

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz