Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

UTF-8 to Shift_JIS in C

Name: Anonymous 2010-09-01 19:33

Hello /prog/,

Since most of you are EXPERT C-PROGRAMMERS I thought some of you might have an idea as how to best implement this.
In short, I need to convert (wchar_t) UTF-8 input to Shift_JIS, but I am a very incompetent C programmer.

My current implementation is to convert the entire map located here: http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT to a huge string on the form ("u%s%s", unicode, sjiscode) and then searching for the hexform of the unicode input.
This is obviously very slow and I ask of you if you could be so kind as to tell me how one is supposed to go about doing this sort of thing in C, as I currently am in a learning process.

Thank you for your attention.

Name: Anonymous 2010-09-01 19:49

>>1
On Windows/MSVC I do it like this:
1) Using libc (with MS extensions, but I think some might work with gcc, check appropriate documentation):
 a) Set the locale you'll be using for multibyte strings:
    setlocale(LC_ALL, "lang_JPN.932"); // from SJIS
    For more details see: http://msdn.microsoft.com/en-us/library/hzz3tw78.aspx
 b) Use the approrpiate functions which operate on either unicode strings or multibyte strings (use the right prefixes). In this case, conversion is done using mbstowcs (multibyte->widechar) and wcstombs(widechar->multibyte), when dealing with more varied encodings, it's not uncommon to do MB->UTF16->MB. UTF16 or "widechar" is common because of its simple in-memory representation, of a ushort(WORD/16bit value) for each unicode character. Multiple setlocale calls may be needed
2)Using WinAPI. Less portable than 1(which may be supported outside of MSVC), but fully supported on Windows. I find this nicer than the libc variant in that it doesn't depend on global state, and you can just specify the code pages as parameters:
MultiByteToWideChar and WideCharToMultiByte. It's the same *->UTF16->* conversion, except simpler. This is my conversion method of choice for non-portable code. I've done UTF8->SJIS before using this and have some code lieing around for doing this. I've only used the libc method for sjis->utf16->sjis, so I don't know for sure if utf8 is properly supported, but I believe it is, I just never tried using it before.
3)Portably using iconv. It's the more "bloated" method, but if you don't want to tie your tool/application to an OS, it's a common choice. There are also other libraries around for doing it.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List