Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Fuck this character encoding shit.

Name: Anonymous 2008-12-12 15:26

Let's say I have a 8-bit C string with an arbitrary character encoding. This string represents a local file path. The encoding is arbitrary since it is obtained by calling some .zip file routines - which do not specifically require one encoding or another, so basically everyone can encode his filenames as it goddamn pleases him.
Now, another library requires that all file names are to be encoded in UTF-8 before they can be passed to it.

Now how the fuck do I convert from one encoding to UTF-8 if I do not fucking know which base encoding was used? This is driving me crazy.

Name: Anonymous 2008-12-12 18:28

There are many solutions to this problem. And by solutions I mean shitty workarounds.

>>9 is not one of them, since you're very unlikely to find BOMs in filenames (but perhaps UTF-8 in ZIP does use them - I don't know, I'm not an EXPERT on ZIP FILES).

Your best bets are:

1. Just read the character encoding the host OS is using and always assume that. This is what most Windows applications do. This forces users to change said encoding or use tools like AppLocale to troubleshoot encoding problems though.

2. Try to guess. This is the most realistic option.

3. If 2 fails, or if it can return scores and the winner is not very clear, prompt the user giving the best choices with an example of how the filename looks with each. Microsoft Word does this for text files.

Hopefully now you'll understand why ADVANCED ENTERPRISE LANGUAGES such as PYTHON 3000 enforce an universal character encoding to avoid this utter bullshit.

ONE WORD: THE FORCED UNICODE 8-BIT USE. THREAD OVER

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List