Unicode and HTML

Web pages authored using HyperText Markup Language (HTML) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set", which defines the set of characters that may be present in a HTML document and assigns numbers to them, and the "external character encoding", or "charset", used to encode a given document as a sequence of bytes.

Comment
enWeb pages authored using HyperText Markup Language (HTML) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set", which defines the set of characters that may be present in a HTML document and assigns numbers to them, and the "external character encoding", or "charset", used to encode a given document as a sequence of bytes.
Date
3 November 2007
Has abstract
enWeb pages authored using HyperText Markup Language (HTML) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set", which defines the set of characters that may be present in a HTML document and assigns numbers to them, and the "external character encoding", or "charset", used to encode a given document as a sequence of bytes. In RFC 1866, the initial HTML 2.0 standard, the document character set was defined as ISO-8859-1 (later HTML standard defaults to Windows-1252 encoding). It was extended to ISO 10646 (which is basically equivalent to Unicode) by RFC 2070. It does not vary between documents of different languages or created on different platforms. The external character encoding is chosen by the author of the document (or the software the author uses to create the document) and determines how the bytes used to store and/or transmit the document map to characters from the document character set. Characters not present in the chosen external character encoding may be represented by character entity references. The relationship between Unicode and HTML tends to be a difficult topic for many computer professionals, document authors, and web users alike. The accurate representation of text in web pages from different natural languages and writing systems is complicated by the details of character encoding, markup language syntax, font, and varying levels of support by web browsers.
Is primary topic of
Unicode and HTML
Label
enUnicode and HTML
Link from a Wikipage to an external page
www.hotpeachpages.net/a/characters.html
www.pinyin.info/tools/converter/chars2uninumbers.html
www.alanwood.net/unicode/cjk_compatibility_ideographs.html
www.w3.org/TR/REC-html40/HTMLlat1.ent
www.w3.org/TR/REC-html40/HTMLspecial.ent
www.w3.org/TR/REC-html40/HTMLsymbol.ent
unicode.coeurlumiere.com/
www.alanwood.net/unicode/
www.unicode.org/charts/
www.unicodemap.org/
www.w3.org/TR/unicode-xml/
web.archive.org/web/20071103125951/http:/unicode.coeurlumiere.com/
www.phon.ucl.ac.uk/home/wells/ipa-unicode.htm
scripts.sil.org/cms/scripts/page.php%3Fsite_id=nrsi
web.archive.org/web/20110924073701/http:/www.w3.org/TR/html5/semantics.html%23charset
Link from a Wikipage to another Wikipage
7 (number)
A
Abstraction
Arabic alphabet
ASCII
Basic Multilingual Plane
Bit
Byte
Byte order mark
Category:HTML
Category:Unicode
Character (computing)
Character encoding
Character encodings in HTML
Character reference
Charset detection
CJK Unified Ideographs
Code2000
Comparison of Unicode encodings
Computer font
Computer network
Computer storage
Cyrillic script
Decimal
Delta (letter)
Document Type Definition
Em dash
Endianness
Face with Tears of Joy emoji
Fe (rune)
File system
Ge'ez alphabet
Google
Grapheme
Greek alphabet
Háček
Hangul
Hebrew alphabet
Hexadecimal
Hiragana
HTML
HTML5
HTML email
HTTP
IEC 8859-1
Internet Explorer
Internet Explorer 6
ISO 10646
ISO 8859-1
Latin alphabet
List of typefaces
List of XML and HTML character entity references
Malayalam alphabet
Markup language
Mem
Meta:Help:Special characters
MIME
Mozilla Firefox
Natural language
Netscape Navigator
Numerical digit
Numeric character reference
Numeric character references
Octet (computing)
Opera (web browser)
Operating system
Programming language
Qha
Qoph
Runic alphabet
Safari (web browser)
Short I
Simplified Chinese characters
ß
Syllable
Thai alphabet
Thorn (letter)
Traditional Chinese characters
Unicode
Unicode block
Unicode Transformation Format
Universal Character Set
UTF-16
UTF-16BE
UTF-16LE
UTF-32BE
UTF-32LE
UTF-8
Web browser
Web page
Windows-1251
Windows-1252
World Wide Web
Writing system
XHTML
XML
SameAs
3GmQD
m.07vv9
Q3549946
Subject
Category:HTML
Category:Unicode
Url
https://web.archive.org/web/20071103125951/http:/unicode.coeurlumiere.com/
WasDerivedFrom
Unicode and HTML?oldid=1116218032&ns=0
WikiPageLength
22301
Wikipage page ID
31985
Wikipage revision ID
1116218032
WikiPageUsesTemplate
Template:Citation needed
Template:Essay-like
Template:Html series
Template:IETF RFC
Template:Main
Template:Multiple issues
Template:Primary sources
Template:Refimprove
Template:Reflist
Template:Rewrite
Template:Short description
Template:Snd
Template:SpecialChars
Template:Toomanylinks
Template:U+
Template:Unicode navigation
Template:Webarchive