Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

UTF-8 to Shift_JIS in C

Name: Anonymous 2010-09-01 19:33

Hello /prog/,

Since most of you are EXPERT C-PROGRAMMERS I thought some of you might have an idea as how to best implement this.
In short, I need to convert (wchar_t) UTF-8 input to Shift_JIS, but I am a very incompetent C programmer.

My current implementation is to convert the entire map located here: http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT to a huge string on the form ("u%s%s", unicode, sjiscode) and then searching for the hexform of the unicode input.
This is obviously very slow and I ask of you if you could be so kind as to tell me how one is supposed to go about doing this sort of thing in C, as I currently am in a learning process.

Thank you for your attention.

Name: Anonymous 2010-09-02 15:38

>>12
6 actually. A character can take up to 6 bytes in UTF-8. This is because for extended characters, only 6 bits out of each byte are available, so you need 6*6=36 to encode a 32-bit code point. The top two bits of the first byte are 11, and for the rest of the bytes in the character they are 10.

The reason for this is so that you can resynchronize a broken UTF-8 stream. If you lose some data, you just wait until a byte has the top bit 0 or the top two bits 11, and you've found the start of a character.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List