Automatic Language Selection up to XP

edited May 5 in Programming
Good Evening,

I hope this topic hasn’t already been covered in another thread; if it was, I'm sorry.

While testing a Win32 program that runs on old Windows versions, I noticed that, despite the Windows UI language being the same, the program used different language strings (loaded with the LoadString() function) on different Windows versions. As it was my own program, I decided to investigate, because the behaviour I observed is not consistent with what the resources I could find suggest.

How languages work in (old) Windows

Just to get us on the same page, let me quickly explain how Windows conceptualises languages.

In Win32, a language identifier consists of two fields – (primary) language ID and sublanguage ID. The (primary) language ID is used for the language in general, e.g., English, German, French… The secondary language is a regional variation, e.g. English (U.K.), German (Switzerland), French (Canada). These fields are composed such that the primary ID is the low byte and the sublang ID is the high byte. Therefore, Language English (0x09) and Sublang English (U.S.) (0x04) make 16-bit language ID 0x0409.

There are three neutral languages before Vista. Visual Studio 2003 calls them
  • “Neutral” (LANG_NEUTRAL, SUBLANG_NEUTRAL),
  • “Neutral (Default)” aka “Process Default Language” (LANG_NEUTRAL, SUBLANG_DEFAULT), and
  • “Neutral (Sys. Default)” (LANG_NEUTRAL, SUBLANG_SYS_DEFAULT).
We also need to distinguish between three Windows language settings: UI language (which the OS actually displays in), system language (the language for the entire system) and user language (the current user’s language). System and user language are also referred to as locales, and I understand that a locale is not exactly the same thing as a language, but for our purposes, I think we can conceive of them as language selections. Also, I understand Windows Server 2003 and later introduced Multilingual User Interfaces, and Vista introduced new API functions to query language information, but my investigation was only concerned with Windows up to XP 32 bit.
On Windows 9x and Windows NT up to and including NT 3.51, the system language is immutable and always the same as the UI language. NT 4.0 introduced the ability to change the system language, disconnecting it from the UI language. XP renamed the system language to the “Language for non-Unicode programs”, which is what the setting is called to this day.

How language selection should work

In theory, at least according to this old MS Knowledge Base article, Windows 95 tries to match the „system language“, while NT first tries to match „the language of the calling thread“. First, they both try to find an exact match (i.e., same primary language and sublanguage), and, failing that, they try to match just the primary language. Importantly, the article explicitly states that this means that e.g., if the desired language is German (Germany), but that is unavailable, the system will match, e.g. German (Switzerland) instead.
If no resources can be found in the appropriate language, Windows select “language neutral” languages, and, if those cannot be found, fall back to English. If this still fails, it will just pick any language. The matching order should therefore be:
  1. Calling thread (NT) / System (95) exact
  2. Calling thread (NT) / System (95) primary
  3. Language Neutral
  4. English
  5. Any
However, I was not able to observe this order of evalution in any of my experiments, though I should point out that the article only mentions the FindResourceEx() function, while I used the LoadString() function.

Methodology

I had to redo the experiment several times, because I initially, made some ambiguous language choices. This account should present all the relevant information while sparing you the chaos, though.

I used German (UI language) versions of Windows 98, NT 3.51, NT 4.0, 2000, XP and 7 with their user language set to Portuguese (Portugal) and their system language (if mutable) set to French (France).
I assume 98 to be representative of all 9x, NT 3.51 to be representative of all versions below it and 7 (which behaved like 2000) to be representative of all versions above it (although I wouldn’t be surprised if Windows 10 changed something).

First, I made a Win32 program that sets the window title using LoadString(). It contained string tables in the following languages:
  • Neutral
  • Process Default Language
  • Neutral (Sys. Default)
  • English (U.S.)
  • English (U.K.) [as a potential English-language fallback]
  • German (Germany)
  • German (Switzerland)
  • French (France)
  • French (Canada)
  • Portuguese (Portugal)
  • Portuguese (Brazil)
  • Danish [as a completely unrelated language]
  • Catalan [as a completely unrelated language]
Danish was the initial choice for an unrelated fallback, but ended up being unused. The reason why Catalan superseded it is because its primary language ID of 3 is lower than any other language in the list, so combining it with SUBLANG_NEUTRAL (instead of the usual SUBLANG_DEFAULT for languages without sublangs) leads to a very low 16-bit language ID of 0x0003. This is important, because (as a similar investigation on StackOverflow found and I was able to confirm), once Windows is out of ideas and starts picking any language, it will prefer that with the lowest 16-bit language ID.

During testing, I noticed that, unlike the old Knowledge Base article specified, Windows never seemed to substitute another specific language if it couldn’t find an exact match, i.e., if Windows wanted German (Germany) but couldn’t get it, it would never select German (Switzerland) or vice-versa. According to the MSKB, one should combine a primary language with SUBLANG_NEUTRAL in the resource files (e.g. LANG_GERMAN, SUBLANG_NEUTRAL) if it should be used for all sublanguages, so I adopted this in the test program, setting the sublanguage of all the bold languages in the list to SUBLANG_NEUTRAL.
Indeed, Windows would now fall back to the appropriate Primary-Neutral combination.

I compiled several versions of the program, always removing the first language that Windows selected and then testing again, until the program selected Catalan (indicating that, at this point, Windows was just picking any language).

How language selection actually works

This is the language selection order I observed in Windows 98:
  1. Neutral
  2. Process Default Language
  3. Neutral (Sys. Default)
  4. German (Germany) [UI = system exact]
  5. Portuguese (Portugal) [User exact]
  6. Portuguese (Neutral) [User fallback]
  7. English (U.S.)
  8. Catalan [any]
Note that, when trying to match the system language (which, again, is always the UI language on 9x), Win98 will only ever want an exact match; it did not select German (Neutral) even when available.

This is the language selection order I observed in the NT line, in a table to illustrate its evolution.
No.NT 3.51 and belowNT 4.02000 and up
1NeutralNeutralNeutral
2German (Germany) [UI exact]
3German (Neutral) [UI fallback]
4Portuguese (Portugal) [User exact]Portuguese (Portugal) [User exact]Portuguese (Portugal) [User exact]
5Portuguese (Neutral) [User fallback]Portuguese (Neutral) [User fallback]Portuguese (Neutral) [User fallback]
6French (France) [System exact]French (France) [System exact]
7French (Neutral) [System fallback]French (Neutral) [System fallback]
8English (U.S.)English (U.S.)English (U.S.)
9Catalan [any]Catalan [any]Catalan [any]

Okay, but why are you telling me this?

A few things to note when making a Win32 application that runs on these old Windows versions are:
  • String tables should not use Neutral (on NT) or any of the neutral primary languages (on 9x), as they will otherwise override any other string table.
  • The English string table should be English (U.S.) so that it is always selected if all other matches fail on non-English systems.
  • If you want Windows 9x to prefer its UI = system language for your application, you’ll need to provide it not just with SUBLANG_NEUTRAL, but also with the appropriate sublang for the default regional variation (e.g. not just German (Neutral), but also German (Germany)).
… or just implement your own logic with FindResourceEx() and/or, better still, make the language user-selectable.

I hope this information is of use to anyone finding this thread in 20 years during a Google search for tangentially related information. I am reasonably confident now that my findings are correct, though, if you have any corrections, I am of course open to them. Also, feel free to ask any questions about this—I’ve got the stuff all set up right now, so I should be able to do further investigations relatively conveniently.

Good Night!
Sign In or Register to comment.