Unicode™ : Java Glossary

go to home page U words local find full screen, hide local find menu Google search web for more information on this topic jump to foot of page translate this page with Babelfish 2008-08-20 by Roedy Green ©1996-2008 Canadian Mind Products
index page for letter ⇒ punctuation 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (all)
Unicode logo Unicode
Unicode Glyphs BOMs : Byte Order Marks
What Is Unicode? What’s Missing From Unicode?
Symbols Unicode Editors
Arrows Books
Viewing Glyphs Links
Creating Unicode Documents

Unicode Glyphs

Unicode 16 and Unicode 32 Glyphs
in Downloadable Acrobat PDF Format
code Description code
† = 32 bit
Description
0000 Basic Latin 2600 unicode malemale 26A3  Miscellaneous Symbols chess, astrology, I-ching, telephones, hazards, religious symbols, hammer and sickle.
0080 e acute Latin-1 Supplement accented letters, basic symbols 2700 unicode dingbat 2744  Dingbats asterisks, ornaments, hands, right-pointing arrows, pencils, scissors, pens.
0100 g circumflex Latin Extended-A Esperanto accented letters 27C0 unicode perpendicular 2708  Miscellaneous Mathematical Symbols-A, including SQL left, right and full joins.
0180 Latin Extended-B African 27F0 Supplemental Arrows-A
0250 unicode ipa 1293  IPA (International Phonetic Alphabet) Extensions 2800 unicode braille 285B Braille Patterns
02B0 Spacing Modifier Letters 2900 Supplemental Arrows-B
0300 Combining Diacritical Marks 2980 unicode error bar square 29ef Miscellaneous Mathematical Symbols-B
0370 Greek 2A00 unicode circle-Plus 2a01 Supplemental Mathematical Operators including variants of + - × ÷
0400 Cyrillic 2B00 unicode pentagon 2b20  Miscellaneous Symbols and Arrows
0500 Cyrillic Supplement 2C00 Glagolytic pre Cyrillic Bulgarian
0530 Armenian 2E80 CJK Radicals Supplement Chinese Japanese Korean
0590 Hebrew 2F00 Kangxi Radicals
0600 Arabic 2FF0 Ideographic Description Characters
0700 Syriac 3000 CJK Symbols and Punctuation Chinese Japanese Korean
0780 Thaana 3040 Unicode hiragana 3041  Hiragana (Japanese) Used when no Kanji character exists.
0900 Devangari (Hindi) 0921 Devanagari: Hindi 30A0 Unicode katakana 30b0  Katakana (Japanese) mainly for foreign names
0980 Bengali 3100 Bopomofo: phonetic script for Mandarin
0A00 Gurmukhi 3130 Hangul Compatibility Jamo
0A80 Gujarati 3190 Kanbun: used by Japanese to annotate classic Chinese
0B00 Oriya 31A0 Bopomofo Extended
0B80 Tamil 31F0 Katakana Phonetic Extensions
0C00 Telugu 3200 Enclosed CJK Letters and Months Chinese Japanese Korean
0C80 Kannada 3300 CJK Compatibility Chinese Japanese Korean
0D00 Malayalam 3400 CJK Unified Ideographs Extension A Chinese Japanese Korean
0D80 Sinhala 4DC0 Yijing Hexagram Symbols
0E00 Thai 4E00 unicode chinese symbol 4E70  CJK Unified Ideographs Chinese Japanese Korean huge download
0E80 Lao A000 Yi Syllables
0F00 Tibetan A490 Yi Radicals
1000 Myanmar AC00 Hangul Syllables
10A0 Georgian D800 High Surrogates
1100 Hangul Jamo DC00 Low Surrogates
1200 Ethiopic E000 Private Use Area
13A0 Cherokee F900 CJK Compatibility Ideographs Chinese Japanese Korean
1400 Canadian Aboriginal Syllabic FB00 unicode ligature fi fb01  Alphabetic Presentation Forms,
ligatures including Hebrew
1680 Ogham FB50 Arabic Presentation Forms-A
16A0 Runic FE00 Variation Selectors, non-printing control characters
1700 Tagalog FE20 Combining Half Marks
1720 Hanunoo FE30 CJK Compatibility Forms Chinese Japanese Korean
1740 Buhid FE50 Small Form Variants
1760 Tagbanwa FE70 Arabic Presentation Forms-B
1780 Khmer FF00 Halfwidth and Fullwidth Forms
1800 Mongolian FFF0 Specials, byte order marks.
1900 Limbu †0001 0000 Linear B Syllabary (32-bit)
1950 Tai Le †0001 0080 Linear B Ideograms (32-bit)
19E0 Khmer Symbols †0001 0100 Aegean Numbers (32-bit)
1D00 Phonetic Extensions †0001 0300 Old Italic (32-bit)
1E00 Latin Extended Additional, dotted letters, letters with two accents. †0001 0330 Gothic (32-bit)
1F00 Greek Extended †0001 0380 unicode ugaritic cuneiform symbol 10389  Ugaritic Cuneiform (32-bit)
2000 General Punctuation †0001 0400 Deseret Mormon (32-bit)
2070 Superscripts and Subscripts †0001 0450 Shavian (32-bit)
20A0 unicode Euro 20AC  Currency Symbols †0001 0480 Osmanya: Somalian (32-bit)
20D0 Combining Marks for Symbols †0001 0800 Cypriot Syllabary (32-bit)
2100 Letterlike Symbols †0001 D000 Byzantine Musical Symbols (32-bit)
2150 unicode 5/8 215d  Number Forms, Roman Numerals and fractions †0001 D100 unicode treble clef  Musical Symbols (32-bit)
2190 Arrows †0001 D300 Tai Xuan Jing Symbols (32-bit) Look like I-Ching hexagrams truncated to four lines.
2200 unicode integral  Mathematical Operators, del, grad, element, there exists, for all, union, intersection, contains, dot product, cross product, therefore, square root, logical and, logical or, summation, product. †0001 D400 unicode Real symbol 1D4E1  Mathematical Alphanumeric Symbols (32-bit)
2300 unicode apl 23c3  Miscellaneous Technical, APL operators. †0002 0000 unicode chinese symbol 200F0  CJK Unified Ideographs Extension B (32-bit) Chinese Japanese Korean huge download
2400 unicode soh 2401  Control Pictures for displaying unprintable ASCII control chararacters. †0002 F800 CJK Compatibility Ideographs Supp. (32-bit) Chinese Japanese Korean
2440 unicode banksymbol 2446  Optical Character Recognition †000E 0000 unicode A with tag e0041 Tags, control characters. (32-bit)
2460 unicode circled four 2463 Enclosed Alphanumerics †000E 0100 Variation Selectors Supp., non printing control characters (32-bit)
2500 Box Drawing †000F 0000 Supplementary Private Use Area-A (32-bit)
2580 unicode block 2591  Block Elements †0010 0000 Supplementary Private Use Area-B (32-bit)
25A0 unicode geometric 25f6  Geometric Shapes

What Is Unicode?

A 16-bit character encoding used in Java. See the glyphs, in PDF format. Requires Adobe Acrobat to view. Also available as ASCII text file describing the glyphs with cross references to similar glyphs.

Sometimes called UCS or ISO 10646. Unicode allows Java to handle international characters for most of the world’s living languages, including Arabic, Armenian, Bengali, Bopomofo, Chinese (via unified Han), Cyrillic, English, Georgian, Greek, Gujarati, Gurmukhi, Hebrew, Hindi (Devanagari), Japanese (Kanji, Hiragana and Katakana via unified Han), Kannada, Korean (Hangul via unified Han), Lao, Maylayalam, Oriya, Tai, Tamil, Telugu, Tibetan… Unicode will make it much easier for non-English speaking programmers to write programs for English speaking users and vice versa.

In Java, you get at the exotic characters by encoding them in hex in your strings like this: "\u00f7\u2713" to produce ÷ ✓. See String literals for more details.

In HTML, you get at the exotic characters by encoding them as entities such as ÷✓ to produce ÷ ✓.

Unicode Symbols

There are even codes for:
apple '\uf000' unofficial, private use area
British pound sign £ '\u20a4'
checkmark '\u2713'
copyright © '\u00a9'
degree ° '\u00b0'
dharma wheel '\u2638'
division ÷ '\u00f7'
bullet '\u2022'
euro '\u20ac'
female '\u2640'
funeral urn '\u26b1'
heart '\u2665'
bullet (as mathematical operator) '\u2219'
infinity '\u221e'
integral '\u222b'
male '\u2642'
pi π '\u03c0'
PI Π '\u03a0'
registered trade mark ® '\u00ae'
sun '\u2600'
telephone '\u260e'
trademark '\u2122'
This does not mean your fonts will support all these wonders, of course.

In addition there all kinds of interesting special characters characters such as: Alphabetic Presentation Forms, APL, Arrows, Bengali, Block Elements, Box Drawing, Braille Patterns, Byzantine Musical Symbols, Combining Diacritical Marks, Combining Half Marks, Combining Marks for Symbols, Control Pictures — icons for control chars, Currency Symbols, Dingbats, Enclosed Alphanumerics, General Punctuation, Geometric Shapes, Halfwidth and Fullwidth Forms, High Surrogates, Ideographic Description Characters, IPA Extensions, Letterlike Symbols, Low Surrogates, Mathematical Alphanumeric Symbols (32 bit Unicode), Mathematical Operators, Mathematical Symbols, Miscellaneous Symbols (astrology, chess, playing cards), Miscellaneous Technical (del, grad, integral), Musical Symbols, Number Forms (e.g. Roman numerals), OCR (Optical Character Recognition — the OCR-A MICR characters used in magnetic ink cheque encoding), Old Italic, Runic, Small Form Variants, Spacing Modifier Letters, Specials, Superscripts and Subscripts, Tags (letters with price tags), Unified Canadian Aboriginal Syllabic and Variation Selectors.

Unicode Arrows

There are also arrows:
\u2190
\u2191
\u2192
\u2193
\u2194
\u2195
\u21a2
\u21ac
\u21ad
\u21b0
\u21b6
\u21c5
\u21ce
\u21d0
\u21d1
\u21d2
\u21d3
\u21d4
\u21d5
\u21dc
There are even more arrows defined in Unicode: 2190-21ff, To use these characters in HTML, you need to code them as &… entities.

Viewing Unicode Glyphs

Nic Fulton of Reuters has written an Java Test Applet that can display all 64 thousand Unicode characters including the Chinese/Korean Han. How many of them actually display on your screen depends on the font handling ability of your browser and operating system, and which fonts you have installed. In Java programs, intractable Unicode characters are represented in the form '\uffff', with four hex digits. Ordinary characters like 'A' are actually 16-bit Unicode too.

Creating Unicode Documents

How do you create and edit the various flavours of Unicode documents? You can create them in some specific encoding then convert them. To write a little utility to do that read up on encoding and ask the File I/O Amanuensis for sample code. You can use lowly Notepad in Windows NT/W2K/XP to edit existing documents but not earlier Windows versions. You would have to acquire an almost empty Unicode document for getting started with new documents. It is even clever enough to deal with byte order (endian) marks. Recent version of MS Word in Windows NT/W2K/XP/W2K3 also work.

Byte Order Marks

There are two different standards, Unicode which assigns glyphs to numbers, and UTF which describes how you encode these number in a file. Byte order marks are part of the UTF standard, not the Unicode standard. See more on BOMs (Byte Order Marks).

What’s Missing From Unicode?

THere are no Unicode glyphs for the following: Unicode is not concerned with typesetting, just with raw text. In other words, it is about characters, (logical letters) not glyphs (how letters are precisely shaped). Unicode has various flavours of digits, that look much the same, but they are intended to be used in different contexts.

To typeset, you need separate fonts to handle such variants, with the letters encoded with the same unicode character. The word processor automatically selects the appropriate variant. I don’t know the mechanism by which a word processor can tell which fonts are related, and which styles and font-weights each supports. Presumably it is encoded somehow in the font files.

To a large extent ligatures are handled outside Unicode by automatically combining Unicode characters, though there are a few ligatures that rate a special Unicode character.

Unicode Editors

Where do Unicode files come from? You can create them with: You can edit or create UTF-8 or UTF-16 files with windows notepad.

Books

book cover recommend book⇒The Unicode 5.0 Standard
 hardcover
ISBN13:978-0-321-48091-0clickcounter
ISBN10:0-321-48091-0clickcounter
publisher:Addison-Wesley
published:2006-11-19
by:The Unicode Consortium
Unicode 5.0 adds the following:
  • Security mechanisms
  • a standard collation algorithm for various national orderings.
  • A common locale data repository.
  • Improvements to the encoding model for UTF-8.
  • Rigorous stability of case folding.
  • a systematic framework covering combining characters, Unicode strings, line breaking, and segmentation
UK flag abe books.co.uk abe books.ca Canadian flag
UK flag amazon.co.uk. amazon.ca. Canadian flag
German flag abe books.de chapters.indigo.ca . Canadian flag
German flag amazon.de. abe books.com American flag
French flag abe books.fr amazon.com. American flag
French flag amazon.fr. barnes and noble.com American flag
Italian flag abe books.it powells.com American flag
Spanish flag iberlibro.com abe books anz Australian flag

CMP homejump to top
CMP logo
feedback Please email your feedback for publication, errors, omissions, broken/redirected link reports
and suggestions to improve this page to Roedy Green : feedback email
made with CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[65.110.21.43]
Your face IP:[38.103.63.62] Take the DavidSuzuki.org nature challenge
You are visitor number 153,829.
You can get a fresh copy of this page from: or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror)
http://mindprod.com/jgloss/unicode.html J:\mindprod\jgloss\unicode.html