UTF

#UTF| 来源: 网络整理| 查看: 265

i18nqa.com -> Encoding Debug Table -> bug-utf-8-latin1 Encoding Problem: Treating UTF-8 Bytes as Windows-1252 or ISO-8859-1 Symptom

Instead of an expected character, a sequence of Latin characters is shown, typically starting with Ã or Â. For example, instead of "è" these characters occur: "Ã¨".

Explanation

A common problem is for characters encoded as UTF-8 to have their individual bytes interpreted as ISO-8859-1 or Windows-1252. For example:

A Web page is encoded as UTF-8 characters. The Web server mistakenly declares the charset to be ISO-8859-1 in the HTTP protocol that delivers the page to the browser. The browser will then display each of the UTF-8 bytes in the Web page as Latin-1 characters. A file such as a Java property file, which is encoded with UTF-8, is incorrectly converted as it is imported. As it is read in by Java it is converted from ISO-8859-1 to UTF-8.

A character such as è (e-Grave, U+00E8) consists of two bytes in UTF-8: 0xC3 and 0xA8. If each of these bytes are treated as either ISO-8859-1 or Wiindows-1252 code points, then the displayed characters will be Ã and ¨.

Table 1Example Treating UTF-8 Bytes as Windows-1252 or ISO-8859-1 CharacterUTF-8 BytesBytes viewed in Latin-1 è0xC3, 0xA8Ã, ¨

You can use the Encoding Debug Table to look up any erroneous sequence of Latin characters and find out the UTF-8 character that it corresponds to and that generated it.

References Encoding Debug Table Encoding Problem: Treating UTF-8 Bytes as Latin-1 Encoding Problem: Double Mis-Conversion Encoding Problem: ISO-8859-1 vs Windows-1252 Comparing ISO-8859-1 vs. ISO-8859-15 Comparing Windows-1252 vs. ISO-8859-1 Copyright © 2011 Tex Texin. All rights reserved; return to top

【本文地址】

公司简介

联系我们