The ISO Latin 1 character repertoire - a description with usage notes,section 4
Note: Some characters appear in more than one category in this classification, due to different uses. (For example, hyphen-minus has dual use as punctuation symbol and as mathematical symbol.)
These are the letters which are conventionally called the Latin letters. This letter repertoire was in practice selected for the purpose of writing the English language. (Notice that the letter w is not part of the alphabet of the Latin language.)
Notice that although many of the characters are often presented using glyphs similar to those for Greek and Russian characters, for example, these character repertoires are by definition distinct. For example, the Latin letter A is not the same as the Greek capital letter alpha or the first capital letter of the Cyrillic alphabet, although the same glyph could be used for all of them and although they might, under some circumstances, be pronounced similarly.
There is a large number of various derivatives of Latin letters, such as letters with diacritics (some of which belong to ISO Latin 1) and various symbols which historically originated as forms of letters (letterlike symbols) or as ligatures (such as the ampersand, &, which was originally a ligature of e and t).
Several basic Latin letters are in use as such as symbols for physical units and other special purposes. For example, the symbol for the SI unit ampere is regarded as identical with the capital letter A, and similarly the symbol for the SI prefix kilo- is identical with small letter k.
There are also many letterlike symbols which have been historically formed from letters, such as a double-struck capital R used to denote the set of real numbers in mathematics. Quite a many of them have their own code positions and names in Unicode, either in the Letterlike Symbols block or elsewhere. Depending on the symbol and context, they can be regarded as merely glyph variants of the basic letters or as completely independent symbols or as something between. When ISO Latin 1 repertoire only is available, there isn't much choice: either you use the normal letter (such as "R" as a symbol of the set of real numbers) or you avoid using the symbol at all, expressing things verbally (e.g. "the set of real numbers"). In the first case, you should try to make things clear to readers, perhaps including a separate description of the notations used. You might additionally try to use a specific font to suggest that the letter is used in a special meaning. - Notice, however, the following independent (non-letter) characters belong to ISO Latin 1 and can be used for their proper meanings: ¢ (originally formed from "c"), £ (originally formed from "L"), ¥ (originally formed from "Y"), © (originally formed from "C"), and ® (originally formed from "R").
Loosely speaking, a diacritic mark is a sign such as an accent (e.g. acute accent ´) attached to a character (such as letter e) to create a new character (such as é). Most diacritics are placed above a letter.
Very often a diacritic mark indicates some change in the pronunciation as compared with the base letter. However, the rules for this are language-dependent.
Various approaches to enabling the use of letters with diacritics have been suggested and tried in different systems and standards:
| dec | oct | hex | ASCII primary name | secondary use |
|---|---|---|---|---|
| 34 | 42 | 22 | quotation mark (") | diaeresis (¨) |
| 39 | 47 | 27 | apostotrophe (') | acute accent (´) |
| 44 | 54 | 2C | comma (,) | cedilla (¸) |
| 94 | 136 | 5E | upward arrow head | circumflex accent (^) |
| 126 | 176 | 7E | overline | tilde (~) |
In ISO Latin 1, there are several characters which are "precomposed" from a basic Latin letter and a diacritic:
| À | Á | Â | Ã | Ä | à | á | â | ã | ä |
| È | É | Ê | Ë | è | é | ê | ë | ||
| Ì | Í | Î | Ï | ì | í | î | ï | ||
| Ò | Ó | Ô | Õ | Ö | ò | ó | ô | õ | ö |
| Ù | Ú | Û | Ü | ù | ú | û | ü | ||
| Ý | ý | ÿ |
Other letters with diacritics in ISO Latin 1 are:
Å å ("a" with ring above)
Ç ç ("c" with cedilla)
Ñ ñ ("n" with tilde)
The meanings of an accent or other diacritic are generally different in different languages. For example, an accent on a vowel may indicate that the vowel is stressed, or that it is long, or that it is otherwise phonetically different from the sound denoted by the base letter. Sometimes accents are used just to make a distinction between words which would otherwise be similar (as in Italian "è" 'is', as opposite to "e" 'and'). To take a further example, o with diaeresis (ö) is sometimes used in English (e.g. in the word "coöperation") to signal that the letter "o" is pronounced separately instead being combined with the preceding vowel; in German it denotes the vowel "o umlaut" which is quite distinct from "o" but appearing after "o" in alphabetic order; in Swedish it denotes a separate sound too but is positioned as the last letter of the alphabet. There are some additional notes on usage in the descriptions of the spacing diacritics.
The exact rules for using diacritics vary, depending on the language, and even within a language. In particular, in the French language, which uses diacritics extensively, there has been a reform of the official orthography in the 1990s. It should also be noted that although it has been rather common in French to omit diacritics from capital letters, such usage seems to have been caused by technical difficulties basically. Thus, an upper case letter should have a diacritic according to the normal rules of the language.
ISO Latin 1 contains the following diacritics as separate and spacing characters:
| ´ | acute accent |
| ` | grave accent |
| ^ | circumflex accent |
| ~ | tilde |
| ¨ | diaeresis |
| ¸ | cedilla |
It might be argued that the ISO 8859-1 standard is ambiguous regarding whether these character denote spacing or non-spacing characters. But Unicode and ISO 10646 definitely specify them as spacing.
In Unicode, there are other diacritics, too, such as breve and caron (hacek).
The term spacing as a property of a character means that the character is presented visually using a separate glyph which occupies its own space (smaller or larger), as opposite to being graphically combined with other characters using e.g. overprinting.
In addition to spacing diacritics like those mentioned above, Unicode also contains nonspacing diacritics. The are also (and officially, in Unicode terminology) called combining. A spacing diacritic like circumflex accent (^), apart from its secondary technical usages for quite different purposes, is useful only for mentioning a circumflex. It can be used e.g. to say that "the letter â is formed from the letter a by attaching the circumflex ^ to it" (although the visual appearance of ^ in a font may significantly differ from the circumflex in â). It can not be used to form the letter â. For instance, "a^" is simply a sequence of two characters; although some programs may convert it to "â", this is something that takes place outside character set issues. In contrast, the combining circumflex accent (U+0302) in Unicode has, as part of its defined meaning, the property that when following a letter, it is logically combined with it to produce a letter with a diacritic. In Unicode technical terms, a character like "â" is a "decomposable character" which is equivalent to the two-character decomposition consisting of the letter "a" followed by the combining circumflex accent (U+0302). In Unicode, there is a very large number of "precomposed" characters like "â" formed from a base character and an embedded diacritic, but sequences of base characters and combining diacritics allow an even wider repertoire to be presented. However, in practice, even those systems which have relatively good support to Unicode rarely support combining diacritics.
The feminine ordinal indicator (ª) and the masculine ordinal indicator (º) can be regarded as letters, too, since they correspond to letters "a" and "o" in specific situations.
The following characters are regarded as
independent
letters, although some of them are historically combinations of
two letters or a letter and a diacritic:
Æ æ (letter ae)
Ð ð (eth)
Þ þ (thorn)
Ø ø (o with stroke)
ß (sharp s)
Notice that the following characters are not regarded as
letters, despite being historically formed from one or more letters:
¢
£
¥
©
®
µ
The "normal" digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 are often called Arabic digits (especially to distinguish them from Roman numerals like XIV). In fact, Western Europeans adopted them from the Arabs, who had adopted them from scripts used in India. In these processes, the shapes of digits changed, however. The digits used in Arabic writing have shapes which differ from those of these "Arabic" digits, and they are classified as separate characters in Unicode: they are "Arabic-Indic digits" in block Arabic. There are also several other sets of digits in Unicode, for use in different scripts.
In Unicode, there are distinct characters for digits used as superscripts or subscripts. Only the superscripts corresponding to 1, 2 and 3, that is ¹ and ² and ³, belong to ISO Latin 1; the others are in block Superscripts and Subscripts in Unicode. Notice that ISO Latin 1 repertoire contains two characters which may look like superscript 0: thee degree sign (°) and the masculine ordinal indicator (º).
When using the ISO Latin character repertoire only, it is probably
best to use superscript
¹ or
² or
³
only if all superscripts used in a document can be expressed that
way. Otherwise, i.e. when you need to use some other method for presenting
other superscripts (such as the
SUP element when authoring in
HTML), it is probably best to use that method throughout,
for uniformity.
The so-called vulgar fractions are characters denoting fractional numbers as single characters. In ISO Latin 1, there are such characters for the fractions 1/4, 1/2, 3/4 (namely ¼ ½ ¾). This reflects the character repertoire on many typewriters. Depending on the font, the bar (which corresponds to fraction slash) can be horizontal or slanted.
Analogously with the situation with
superscript digits,
when using the ISO Latin 1 character repertoire only,
it is probably
best to use vulgar fractions
only if all fractions used in a document can be expressed that
way. Otherwise, i.e. when you need to use some other method for presenting
other fractions
it is probably best to use that method throughout,
for uniformity.
You could use simply expressions like 2/3 and 1/4.
(In the
HTML language,
you might use the
SUP markup for the nominator
and the
SUB markup for the denominator,
thereby suggesting a presentation which somewhat resembles
vulgar fractions in appearance.)
A practical problem with the vulgar fraction characters is that their appearance is often hard to read, especially on computer screens.
The following ISO Latin 1 characters can be classified as punctuation characters:
Punctuation rules vary from one language to another. Even within a language, there might be differences in the recommended rules, depending on style and authority. For the English language, the following resources contain well thought-of recommendations:
| $ | dollar sign |
| ¢ | cent sign |
| £ | pound sign |
| ¤ | currency sign |
| ¥ | yen sign |
For informative notes on actual usage of various symbols and abbreviations for currencies of the world, see e.g. the money table in WWWebster.
It depends on language-specific rules how currency symbols are attached to numbers. In English, the dollar and pound sign are usually written before the number (e.g. $1000), whereas in many other languages currency symbols are written after the number and separated from it with a space. And in Portuguese, for example, dollar sign is used as an escudo symbol so that it appears in place of decimal point (e.g. 30$00 is 30 escudos).
Currencies can be denoted in several ways: words (in some language), currency symbol characters, or various abbreviations. The optimal choice depends on the context and intentions. When uniqueness, definiteness, and internationality (as neutrality with respect to national languages) are essential, the three-letter codes as defined in ISO 4217 should be used.
Note: ISO Latin 1 does not contain
euro sign,
the
symbol for the currency unit
euro (U+20AC).
A new candidate member of the
ISO 8859 family
of character repertoires,
ISO 8859-15 alias ISO Latin 9 (!),
contains
euro sign
in place of
currency symbol (¤).
| % | percent sign |
| + | plus sign |
| - | hyphen-minus |
| < | less-than sign |
| > | greater-than sign |
| = | equals sign |
| ¬ | not sign |
| ¯ | macron |
| × | multiplication sign |
| ÷ | division sign |
| ° | degree sign |
| µ | micro sign |
Notes:
ISO Latin 1 contains only two space characters: normal space and no-break space. In Unicode, there are other space characters too, such as "em space", many of which are defined to have some specific width.
These characters are hard to classify:
| # | number sign |
| & | ampersand |
| * | asterisk |
| / | solidus (slash) |
| \ | reverse solidus (backslash) |
| @ | commercial at |
| _ | low line (underscore) |
| | | vertical line |
| ¦ | broken bar |
| § | section |
| © | copyright sign |
| ® | registered sign |
| ¯ | macron |
Next subsection: Explanations and notations