ANSEL to Unicode Conversion Table

Do not use it; Read the explanation !!!

Despite an intensive search on the web I could not find a full ANSEL specification. Therefore I tried a guess from some sources available in the web. They are all given below. Download them and decide for yourself whether this table is correct or not. Its not an official posting! Use it at your own risk!!

ANSEL	ANSEL Name	O	Unicode	Unicode Name	V	N	S	C	R

A1	slash l - uppercase	+	0141	latin capital letter L with stroke	+		+
A2	slash o - uppercase	+	00D8	latin capital letter O with stroke	+		+
A3	slash d - uppercase	+	0110	latin capital letter D with stroke	+		o		1
A4	thorn - uppercase	+	00DE	latin capital letter thorn	+	+	+
A5	ligature ae - uppercase	+	00C6	latin capital letter AE	+	+	+
A6	ligature oe - uppercase	+	0152	latin capital ligature OE	+	+	+
A7	miagkii znak	+	02B9	modified letter prime	+	+	+		2
A8	middle dot	+	00B7	middle dot	+	+	+
A9	musical flat		266D	music flat sign	+	+	+
AA	patent mark		00AE	registered sign	+		+
AB	plus-or-minus		00B1	plus-minus sign	+	+	+
AC	hook o - uppercase		01A0	latin capital letter O with horn	+		+
AD	hook u - uppercase		01AF	latin capital letter U with horn	+		+
AE	alif	+	02BC	modifier letter apostrophe	?	-	-		3
B0	ayn	+	02BB	modifier letter turned comma	?	-	-		4
B1	slash l - lowercase	+	0142	latin small letter L with stroke	+		+
B2	slash o - lowercase	+	00F8	latin small letter O with stroke	+		+
B3	slash d - lowercase	+	0111	latin small letter D with stroke	+		+		5
B4	thorn - lowercase	+	00FE	latin small letter thorn	+	+	+
B5	ligature ae - lowercase	+	00E6	latin small letter AE	+	+	+
B6	ligature oe - lowercase	+	0153	latin small ligature OE	+	+	+
B7	hard sign (tverdyi znak)		02BA	modified letter double prime	+	+	+		6
B8	dotless i - lowercase	+	0131	latin small letter dotless i	+	+	+
B9	british pound	+	00A3	pound sign	+	+	+
BA	eth	+	00F0	latin small letter eth	+	+	+		5
BC	hook o - lowercase		01A1	latin small letter O with horn	+		+
BD	hook u - lowercase		01B0	latin small letter U with horn	+		+
C0	degree sign		00B0	degree sign	+	+	+
C1	script l		2113	script small L	+	+	+
C2	phonograph copyright mark		2117	sound recording copyright	+		+
C3	copyright symbol	+	00A9	copyright sign	+	+	+
C4	musical sharp		266F	music sharp sign	+	+	+
C5	inverted question mark	+	00BF	inverted question mark	+	+	+
C6	inverted exclamation mark	+	00A1	inverted exclamation mark	+	+	+
CF	es zet	+	00DF	latin small letter sharp S	+	-	+		7
E0	low rising tone mark		0309	combining hook above	+	-	o	+
E1	grave accent	+	0300	combining grave accent	+	+	+	+
E2	acute accent	+	0301	combining acute accent	+	+	+	+
E3	circumflex accent	+	0302	combining circumflex accent	+	+	+	+
E4	tilde	+	0303	combining tilde	+	+	+	+
E5	macron	+	0304	combining macron	+	+	+	+
E6	breve	+	0306	combining breve	+	+	+	+
E7	dot above	+	0307	combining dot above	+	+	+	+
E8	umlaut (dieresis)	+	0308	combining diaeresis	+	+	+	+
E9	hacek (caron)	+	030C	combining caron	+	+	+	+	8
EA	circle above (angstrom)	+	030A	combining ring above	+		+	+
EB	ligature, left half		FE20	combining ligature left half	?	+	+	-
EC	ligature, right half		FE21	combining ligature right half	?	+	+	-
ED	high comma, off center	+	0315	combining comma above right	+		+	-
EE	double acute accent	+	030B	combining double acute accent	+	+	+	+
EF	candrabindu		0310	combining candrabindu	+	+	+	-
F0	cedilla	+	0327	combining cedilla	+	+	+	+
F1	right hook	+	0328	combining ogonek	+	-	o	+
F2	dot below		0323	combining dot below	+	+	+	+
F3	double dot below		0324	combining diaeresis below	+		+	+
F4	circle below		0325	combining ring below	+		+	+
F5	double underscore		0333	combining double low line	+		+	-
F6	underscore	+	0332	combining low line	?		o	?	9
F7	left hook	+	0326	combining comma below	?		o	-	10
F8	right cedilla		031C	combining left half ring below	+	-	o	-	11
F9	half circle below		032E	combining breve below	+	-	o	+
FA	double tilde, left half		FE22	combining double tilde left half	+	+	+	-
FB	double tilde, right half		FE23	combining double tilde right half	+	+	+	-
FE	high comma, centered	+	0313	combining comma above	+		+	-

These data are summarized in a computer readable file which can be found here.

Explanation

1st Column

ANSEL code value (hexadecimal). Values below 80 are identical to the corresponding ASCII value. All values are taken from a GEDCOM 5.5 description, which appear to have an updated ANSEL appendix. Values not listed here are either undefined or have a LDS-specific meaning.

2nd Column

Character name according to the updated ANSEL specification

3rd Column

O = Old ANSEL appendix. '+': this character appears also in the standard GEDCOM 5.5 description.

4th Column

Unicode code point (in hex). All values except one (CF->00DF) are taken from speccharlatin.html (see 4th column).

5th Column

Unicode Name is the name of this Unicode code point which is taken from the Unicode web-page.

6th Column

V = Visible comparison between the ANSEL character as published in ANSEL (ANSI Z39.47-1993) and the unicode character as published at the Unicode web-page.
'+': good matching
'?': slight differences
'-': significant differences

7th Column

N = Name comparison: Comparison of the ANSEL character name (according to the GEDCOM specification) in column 2 and the Unicode code point name (according to the Unicode web page) in column 5.
'+': identical or good matching
'-': no matching
nothing: decide yourself

8th Column

S = Summary: My opinion of the reliability of this row. A combination of the rows 6, 7 and 10.
'+': very reliable
'o': probalbly OK
'-': questionable

9th Column

C = Combined character used. This entry is impotant for building parsers. According to this table the ANSEL code sequence E2 41 has to be converted to: 0301 0041 (accute accent + latin capital letter A), which will be displayed as Á.
'+': the name of the character in this row appears also as a part of the name of other code points: in this example 00C1 (latin capital letter A with accute) displayed as Á (if this character exists within the installed font)
'-' the name does not exist as a part of a name of an european letter (at least not in the name list published at www.unicode.org at the end of 1997)
nothing: does not apply (not a combining character)

10th Column

R = Remarks:
1 The codepoint 00D0 (capital eth) has the same appearence. Another MARC to USC conversion table (DP73.DOC, found on this interesting page) maps A3 to D0. The mapping to 110 appears to be more logical (see comment 3).
2 The comment to unicode code point 02B9 says: transliteration of mjagkij znak (Cyrillic soft sign: palatalization). Therefore the name matches.
3 From the visual comparison (see column V) a mapping to 02BE or 0027 is also possible.
4 From the visual comparison (see column V) a mapping to 02BD is also possible.
5 DP73.DOC (see remark 1) maps B3 to 00F0 (small eth), which is not identical due to the visual comparison (see column 7) and collides with BA. The mapping to 0111 appears to be more logical.
6 The comment to unicode code point 02BA says: transliteration of tverdyj znak (Cyrillic hard sign: no palatalization). Therefore the name matches.
7 Not given in ANSEL (ANSI Z39.47-1993). This is probably a GEDCOM extention of ANSEL. Therefore it is kept here.
8 The comment to unicode code point 030C says: = hacek, V above.
9 The comment to unicode code point 0332 says: = underline, underscore. From the visual comparison a mapping to 0331 (COMBINING MACRON BELOW) is also possible. Furthermore: LOW LINE does not exist in the name of combined characters, but LINE BELOW does. There is no code point named COMBINED LINE BELOW. Therefore I guess that LOW LINE and LINE BELOW are two names for the same character.
10 ANSEL (ANSI Z39.47-1993), table B1: left hook .... Latvian, Romanian. The comment to unicode code point 0332 says: Romanian, Latvian, Livonian.
11 0321 might also be possible

Last modification: 2007-01-16
Back