Go to the previous, next section.
Different countries and cultures have varying conventions for how to communicate. These conventions range from very simple ones, such as the format for representing dates and times, to very complex ones, such as the language spoken.
Internationalization of software means programming it to be able to adapt to the user's favorite conventions. In ANSI C, internationalization works by means of locales. Each locale specifies a collection of conventions, one convention for each purpose. The user chooses a set of conventions by specifying a locale (via environment variables).
All programs inherit the chosen locale as part of their environment. Provided the programs are written to obey the choice of locale, they will follow the conventions preferred by the user.
Each locale specifies conventions for several purposes, including the following:
Some aspects of adapting to the specified locale are handled
automatically by the library subroutines. For example, all your program
needs to do in order to use the collating sequence of the chosen locale
is to use strcoll
or strxfrm
to compare strings.
Other aspects of locales are beyond the comprehension of the library. For example, the library can't automatically translate your program's output messages into other languages. The only way you can support output in the user's favorite language is to program this more or less by hand. (Eventually, we hope to provide facilities to make this easier.)
This chapter discusses the mechanism by which you can modify the current locale. The effects of the current locale on specific library functions are discussed in more detail in the descriptions of those functions.
The simplest way for the user to choose a locale is to set the
environment variable LANG
. This specifies a single locale to use
for all purposes. For example, a user could specify a hypothetical
locale named `espana-castellano' to use the standard conventions of
most of Spain.
The set of locales supported depends on the operating system you are using, and so do their names. We can't make any promises about what locales will exist, except for one standard locale called `C' or `POSIX'.
A user also has the option of specifying different locales for different purposes--in effect, choosing a mixture of two locales.
For example, the user might specify the locale `espana-castellano' for most purposes, but specify the locale `usa-english' for currency formatting. This might make sense if the user is a Spanish-speaking American, working in Spanish, but representing monetary amounts in US dollars.
Note that both locales `espana-castellano' and `usa-english', like all locales, would include conventions for all of the purposes to which locales apply. However, the user can choose to use each locale for a particular subset of those purposes.
The purposes that locales serve are grouped into categories, so
that a user or a program can choose the locale for each category
independently. Here is a table of categories; each name is both an
environment variable that a user can set, and a macro name that you can
use as an argument to setlocale
.
LC_COLLATE
strcoll
and strxfrm
); see section Collation Functions.
LC_CTYPE
LC_MONETARY
LC_NUMERIC
LC_TIME
LC_ALL
setlocale
to set a single locale for all purposes.
LANG
A C program inherits its locale environment variables when it starts up.
This happens automatically. However, these variables do not
automatically control the locale used by the library functions, because
ANSI C says that all programs start by default in the standard `C'
locale. To use the locales specified by the environment, you must call
setlocale
. Call it as follows:
setlocale (LC_ALL, "");
to select a locale based on the appropriate environment variables.
You can also use setlocale
to specify a particular locale, for
general use or for a specific category.
The symbols in this section are defined in the header file `locale.h'.
Function: char * setlocale (int category, const char *locale)
The function setlocale
sets the current locale for
category category to locale.
If category is LC_ALL
, this specifies the locale for all
purposes. The other possible values of category specify an
individual purpose (see section Categories of Activities that Locales Affect).
You can also use this function to find out the current locale by passing
a null pointer as the locale argument. In this case,
setlocale
returns a string that is the name of the locale
currently selected for category category.
The string returned by setlocale
can be overwritten by subsequent
calls, so you should make a copy of the string (see section Copying and Concatenation) if you want to save it past any further calls to
setlocale
. (The standard library is guaranteed never to call
setlocale
itself.)
You should not modify the string returned by setlocale
.
It might be the same string that was passed as an argument in a
previous call to setlocale
.
When you read the current locale for category LC_ALL
, the value
encodes the entire combination of selected locales for all categories.
In this case, the value is not just a single locale name. In fact, we
don't make any promises about what it looks like. But if you specify
the same "locale name" with LC_ALL
in a subsequent call to
setlocale
, it restores the same combination of locale selections.
When the locale argument is not a null pointer, the string returned
by setlocale
reflects the newly modified locale.
If you specify an empty string for locale, this means to read the appropriate environment variable and use its value to select the locale for category.
If you specify an invalid locale name, setlocale
returns a null
pointer and leaves the current locale unchanged.
Here is an example showing how you might use setlocale
to
temporarily switch to a new locale.
#include <stddef.h>
#include <locale.h>
#include <stdlib.h>
#include <string.h>
void
with_other_locale (char *new_locale,
void (*subroutine) (int),
int argument)
{
char *old_locale, *saved_locale;
/* Get the name of the current locale. */
old_locale = setlocale (LC_ALL, NULL);
/* Copy the name so it won't be clobbered by setlocale
. */
saved_locale = strdup (old_locale);
if (old_locale == NULL)
fatal ("Out of memory");
/* Now change the locale and do some stuff with it. */
setlocale (LC_ALL, new_locale);
(*subroutine) (argument);
/* Restore the original locale. */
setlocale (LC_ALL, saved_locale);
free (saved_locale);
}
Portability Note: Some ANSI C systems may define additional locale categories. For portability, assume that any symbol beginning with `LC_' might be defined in `locale.h'.
The only locale names you can count on finding on all operating systems are these three standard ones:
"C"
"POSIX"
""
Defining and installing named locales is normally a responsibility of the system administrator at your site (or the person who installed the GNU C library). Some systems may allow users to create locales, but we don't discuss that here.
If your program needs to use something other than the `C' locale, it will be more portable if you use the whatever locale the user specifies with the environment, rather than trying to specify some non-standard locale explicitly by name. Remember, different machines might have different sets of locales installed.
When you want to format a number or a currency amount using the
conventions of the current locale, you can use the function
localeconv
to get the data on how to do it. The function
localeconv
is declared in the header file `locale.h'.
Function: struct lconv * localeconv (void)
The localeconv
function returns a pointer to a structure whose
components contain information about how numeric and monetary values
should be formatted in the current locale.
You shouldn't modify the structure or its contents. The structure might
be overwritten by subsequent calls to localeconv
, or by calls to
setlocale
, but no other function in the library overwrites this
value.
This is the data type of the value returned by localeconv
.
If a member of the structure struct lconv
has type char
,
and the value is CHAR_MAX
, it means that the current locale has
no value for that parameter.
These are the standard members of struct lconv
; there may be
others.
char *decimal_point
char *mon_decimal_point
decimal_point
is "."
, and the value of
mon_decimal_point
is ""
.
char *thousands_sep
char *mon_thousands_sep
""
(the empty string).
char *grouping
char *mon_grouping
grouping
applies to non-monetary quantities
and mon_grouping
applies to monetary quantities. Use either
thousands_sep
or mon_thousands_sep
to separate the digit
groups.
Each string is made up of decimal numbers separated by semicolons. Successive numbers (from left to right) give the sizes of successive groups (from right to left, starting at the decimal point). The last number in the string is used over and over for all the remaining groups.
If the last integer is -1
, it means that there is no more
grouping--or, put another way, any remaining digits form one large
group without separators.
For example, if grouping
is "4;3;2"
, the number
123456787654321
should be grouped into `12', `34',
`56', `78', `765', `4321'. This uses a group of 4
digits at the end, preceded by a group of 3 digits, preceded by groups
of 2 digits (as many as needed). With a separator of `,', the
number would be printed as `12,34,56,78,765,4321'.
A value of "3"
indicates repeated groups of three digits, as
normally used in the U.S.
In the standard `C' locale, both grouping
and
mon_grouping
have a value of ""
. This value specifies no
grouping at all.
char int_frac_digits
char frac_digits
In the standard `C' locale, both of these members have the value
CHAR_MAX
, meaning "unspecified". The ANSI standard doesn't say
what to do when you find this the value; we recommend printing no
fractional digits. (This locale also specifies the empty string for
mon_decimal_point
, so printing any fractional digits would be
confusing!)
These members of the struct lconv
structure specify how to print
the symbol to identify a monetary value--the international analog of
`$' for US dollars.
Each country has two standard currency symbols. The local currency symbol is used commonly within the country, while the international currency symbol is used internationally to refer to that country's currency when it is necessary to indicate the country unambiguously.
For example, many countries use the dollar as their monetary unit, and when dealing with international currencies it's important to specify that one is dealing with (say) Canadian dollars instead of U.S. dollars or Australian dollars. But when the context is known to be Canada, there is no need to make this explicit--dollar amounts are implicitly assumed to be in Canadian dollars.
char *currency_symbol
In the standard `C' locale, this member has a value of ""
(the empty string), meaning "unspecified". The ANSI standard doesn't
say what to do when you find this value; we recommend you simply print
the empty string as you would print any other string found in the
appropriate member.
char *int_curr_symbol
The value of int_curr_symbol
should normally consist of a
three-letter abbreviation determined by the international standard
ISO 4217 Codes for the Representation of Currency and Funds,
followed by a one-character separator (often a space).
In the standard `C' locale, this member has a value of ""
(the empty string), meaning "unspecified". We recommend you simply
print the empty string as you would print any other string found in the
appropriate member.
char p_cs_precedes
char n_cs_precedes
1
if the currency_symbol
string should
precede the value of a monetary amount, or 0
if the string should
follow the value. The p_cs_precedes
member applies to positive
amounts (or zero), and the n_cs_precedes
member applies to
negative amounts.
In the standard `C' locale, both of these members have a value of
CHAR_MAX
, meaning "unspecified". The ANSI standard doesn't say
what to do when you find this value, but we recommend printing the
currency symbol before the amount. That's right for most countries.
In other words, treat all nonzero values alike in these members.
The POSIX standard says that these two members apply to the
int_curr_symbol
as well as the currency_symbol
. The ANSI
C standard seems to imply that they should apply only to the
currency_symbol
---so the int_curr_symbol
should always
preceed the amount.
We can only guess which of these (if either) matches the usual conventions for printing international currency symbols. Our guess is that they should always preceed the amount. If we find out a reliable answer, we will put it here.
char p_sep_by_space
char n_sep_by_space
1
if a space should appear between the
currency_symbol
string and the amount, or 0
if no space
should appear. The p_sep_by_space
member applies to positive
amounts (or zero), and the n_sep_by_space
member applies to
negative amounts.
In the standard `C' locale, both of these members have a value of
CHAR_MAX
, meaning "unspecified". The ANSI standard doesn't say
what you should do when you find this value; we suggest you treat it as
one (print a space). In other words, treat all nonzero values alike in
these members.
These members apply only to currency_symbol
. When you use
int_curr_symbol
, you never print an additional space, because
int_curr_symbol
itself contains the appropriate separator.
The POSIX standard says that these two members apply to the
int_curr_symbol
as well as the currency_symbol
. But an
example in the ANSI C standard clearly implies that they should apply
only to the currency_symbol
---that the int_curr_symbol
contains any appropriate separator, so you should never print an
additional space.
Based on what we know now, we recommend you ignore these members when printing international currency symbols, and print no extra space.
These members of the struct lconv
structure specify how to print
the sign (if any) in a monetary value.
char *positive_sign
char *negative_sign
In the standard `C' locale, both of these members have a value of
""
(the empty string), meaning "unspecified".
The ANSI standard doesn't say what to do when you find this value; we
recommend printing positive_sign
as you find it, even if it is
empty. For a negative value, print negative_sign
as you find it
unless both it and positive_sign
are empty, in which case print
`-' instead. (Failing to indicate the sign at all seems rather
unreasonable.)
char p_sign_posn
char n_sign_posn
positive_sign
or negative_sign
.) The possible values are
as follows:
0
1
2
3
4
CHAR_MAX
The ANSI standard doesn't say what you should do when the value is
CHAR_MAX
. We recommend you print the sign after the currency
symbol.
It is not clear whether you should let these members apply to the international currency format or not. POSIX says you should, but intuition plus the examples in the ANSI C standard suggest you should not. We hope that someone who knows well the conventions for formatting monetary quantities will tell us what we should recommend.
Go to the previous, next section.