Common charset-related notes on MP3 players

Many MP3 files include metadata containing the song title, album and artist name. Such information is located in a so-called ID3 tag. There are two types of ID3 tags: ID3v1 (not formally standartized) and ID3v2.x (see the formal standard).

Audio players that show ID3 tags to the user must interpret bytes that form the tags as characters. Results depend upon the character encoding used for such interpretation. E.g., the sequence of bytes 0xC3 0xA5 means the character å in the UTF-8 encoding, the characters Ã¥ in ISO-8859-1, ĂĽ in ISO-8859-2 and so on. For the user to be able to read the strings in the tags, the application that originally created the tag and the program used for its display must agree upon the same encoding.

The de-facto standard for encoding used in ID3v1 tags is the character encoding used by MS Windows in the relevant country, because there is no formal standard, and because WinAMP, one of the most popular MP3 players, cannot display anything else. This de-facto standard is also honoured by some hardware MP3 players, e.g., HanBIT XDRUM XD-405. Most of MP3 players for Linux, however, assume the current locale character set by default.

In ISO-8859-1 based locales, this assumption is harmless because the CP1252 code page, used by Windows in those countries, is a superset of ISO-8859-1 (i.e., for every byte for which ISO-8859-1 assigns a character, CP1252 assigns the same character). In other countries, where Windows and Linux use very different character sets (e.g., in Poland, where Windows uses CP1250 and Linux uses ISO-8859-2), or on Linux systems that use UTF-8 locales, this assumption leads to incorrect results.

patched XMMS, Audacious Media Player and MOC allow the character encoding of the ID3v1 tags to be configured by the user and thus can display ID3v1 tags correctly, according to the de-facto standard. Windows-based players may also work under WINE.

For ID3v2.x and OGG tags, the problem described above usually doesn't exist (but some MP3 files with broken ID3V2 tags can be found on the Internet). In ID3v2.x tags, the used character encoding is specified in the tag itself, and the specification for OGG tags allows only UTF-8. Players usually follow the specifications, convert encodings as necessary and display the text correctly.


Last modified 17 years ago Last modified on 08/29/2006 12:57:08 AM
Note: See TracWiki for help on using the wiki.