Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

Characters and encoding bullshit

Name: Anonymous 2012-11-07 21:55

So I've been trying to make a little C# application to parse a file name and its path to automatically tag my extensive library of music and have come upon TagLib to help me.

Now everything was going well until I fell upon some Japanese characters in my library. During parsing, it all appears to be fine, but when it executes the .Save() method, all characters become question marks (?). For example: "アールグレイ" will appear as such during debug, but on save, the actual tag will be "??????"

I have no idea if this is due to the library assuming the MP3 is in ID3v1 format, due to the .Save() method malfunctioning or something else. Has anyone used this library before and come upon a similar problem? Any idea how to fix this?

Name: Anonymous 2012-11-07 21:58

earl grey-san~

Name: Anonymous 2012-11-07 22:02

You can fix that in D.

Name: Anonymous 2012-11-07 22:18

If it were up to me, I'd abolish all languages except English and Hebrew.

Name: Anonymous 2012-11-07 22:30

If it were up to me, I'd abolish all languages except Lojban.

Name: Anonymous 2012-11-07 22:31

>>4
Shalom, Ahmed Goldberg!

Name: Anonymous 2012-11-07 22:33

"GRUNNUR"

Name: Anonymous 2012-11-07 22:34

Use FIOC, Bash or Perl instead of C Shit

Name: phonetic langs 2012-11-07 22:39

>>4
No latin, korean, or greek? (QAQ)

Name: Anonymous 2012-11-07 22:48

open with UTF-8 encoding, stop using shitty notepad

Name: Anonymous 2012-11-07 23:40

>>8
Actually FIOC is even worse. Last time I tried it, print "アールグレイ" failed with a hurr durr I can't figure out the TTY encoding error.

Name: Anonymous 2012-11-08 5:44

>>11
FIOC is a lot worse than you might think right off the bat. You can get it to recognize UTF-8 string input by using the PERFECT solution of inserting a comment
# -*- coding: utf-8 -*-
But you see, it doesn't actually use UTF-8 internally. It doesn't use UCS-4 either, as you might expect. No it uses UTF-16, and this affront to nature is something it will readily remind you of. See now you need to preface all your strings with a nice u"" so python knows it needs to use UTF-16 internally for THAT string (but only that string).

After you're done wasting memory and time debugging all the exceptions thrown because of encoding differences, you want to output. BUT WAIT you say, didn't you say my data was all UTF-16? That's right, so now you need to re-re-encode your data back to UTF-8, using the .encode("utf-8") function.

Conclusion: FIOC has implemented THE PERFECT SOLUTION to the problem of handling UNICODE.

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2012-11-08 6:03

Character encoding related stuff shouldn't be done except at the very ends of the processing, e.g. input and output. The filesystem should treat filenames as opaque strings with possibly reserved path characters, and otherwise perform straight byte matches on filenames.

Name: Anonymous 2012-11-08 7:24

>>12
Unicode width is implementation and platform specific in FIOC. I agree that encoding is pretty hard to get right with FIOC, but that doesn't mean that FIOC's way of doing it is bad, it just means that encoding problems are fucking fucked up and hard to deal with to begin with.

Name: Anonymous 2012-11-08 7:42

>>12
You clearly do not understand what the different encodings are for, or how the encoding of the buffer with the fucking program text differs from encoding of other text files.

You are a fucking moron.  You are so fucking dumb and arrogant that I want you to die in an incinerator to remove the risk of accidental spreading of your defective genes.

Is it that hard to do .decode("utf-whatever") every time you read text and .encode("utf-of-your-choice") when you .write it back?

Kill yourself and stop wasting our valuable air.

>>14
Platform-specific or not is irrelevant because you do not fucking write the internal representation anywhere.

The general rule is UTF-8 (which normal form to default to is debatable) for IO and UTF-16 or UTF-32 with host endianess for internal representation.

Name: Anonymous 2012-11-08 7:59

>>15
I'm glad you think UTF-16 is
1) A valid internal representation
2) A good to expose the programmer to when you're writing abstract string handling data structures
But you see, some of us don't live in the '90s and simply enjoy the benefit of byte arrays for which we don't give a flying fuck about encoding and more importantly: a sane default encoding so morons like FIOC programmers don't go around littering UTF-16 encoded text everywhere.
Take Haskell for example. UTF-8 everything. No need to re-encode shit unless some retard like you sends UTF-16 over the network because you forgot to waste CPU cycles going from one variable length encoding to another. Then of course there's the byte ordering thing to take care of, but I assume that FIOC will automatically use a healthy mixture of different endianness, you know, because sane and consistent internal representation doesn't matter right?

Name: Anonymous 2012-11-08 8:18

Damn, nigga ( >>15 ), you got served!

Name: Anonymous 2012-11-08 8:32

Nothing more delicious than seeing a butthurt ``Pythonista''.

Name: Anonymous 2012-11-08 9:13

>>16
Go back to reddit, ``faggot''

Name: Anonymous 2012-11-08 9:26

>>16
What makes you think FIOC writes its unicode data type anywhere, or even in UTF-16? It has implicit type conversion everywhere. In fact, it is often annoying to find out that your string was converted somewhere along the line and is now fucked up to hell and back.

In fact, I have yet to see a Python program that does not use the host's preferred encoding when writing files. Why am I even responding? Just fuck off, academia scum.

Name: Anonymous 2012-11-08 9:53

>>11,12
Use FIOC 3, problem solved.

Name: Anonymous 2012-11-08 11:52

>>21
FIOC 3 cannot resolve bugs in DNA

Name: Anonymous 2012-11-08 13:29

>recommending FIOC
now you have two problems

Name: Anonymous 2012-11-08 13:35

>>23
epin misuse of the quote function /b/ro

Name: Anonymous 2012-11-08 14:26

>>24
>tfw old people can't into modern electronic communication

Name: Anonymous 2012-11-08 14:50

>>25
LOL I JUST LITERALLY

PEED
MY
PANTS

JUST A LITTE THOUGH

I MEAN ITS A LITTLE SPOT NOT LIKE IT RUINED MY CHAIR R NYTHING LOL BUT FOR REAL EPIC LULZ *HIGH FIVES* XDDDDDDDDDDDDDD


U FRUSTRATED U FRUSTRATED BRO U SO MAD WHY ARE YOU SO MAAAAD I CAN POST ANYTHING I WANT THAT IS HOW IT SAYS IN THE RULES I DONT CARE ABOUT YOUR FAGGOTRY RULES Y SO MAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD


WHATA FUCK MAN xD i just fall of my chair kuz i couldnt and i CANT stop laugh xDXDXDXDXDDDDDDDDDDDDXXXXXXXXXXXXXXXXXXXDDDDDDDDDDDDDDDDDDD OMGOSH DDDDDXXXXXXXXXXXXXXXXXXXXXXXDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDD LOOOOOOOOOLLLLL THIS IS A SHIT XDDDDDDDDDDDDDDDDDDDDXDDDDDDDDDDDDDDDDDDDDD A BIG ONE XDDDDDDDD A GRAT ONE XXXXXXDDDD CONGRATS MAN XD
U FRUSTRATED U FRUSTRATED BRO U SO MAD WHY ARE YOU SO MAAAAD I CAN POST ANYTHING I WANT THAT IS HOW IT SAYS IN THE RULES I DONT CARE ABOUT YOUR FAGGOTRY RULES Y SO MAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD

WHATA FUCK MAN xD i just fall of my chair kuz i couldnt and i CANT stop laugh xDXDXDXDXDDDDDDDDDDDDXXXXXXXXXXXXXXXXXXXDDDDDDDDDDDDDDDDDDD OMGOSH DDDDDXXXXXXXXXXXXXXXXXXXXXXXDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDD LOOOOOOOOOLLLLL THIS IS A SHIT hgXDDDDDDDDDDDDDDDDDDDDXDDDDDDDDDDDDDDDDDDDDD A BIG ONE XDDDDDDDD A GRAT ONE XXXXXXDDDD CONGRATS MAN XD

HOOOOOOOOLLLLLLYYYYY SHIT

whatr the HELL

WHATA FUCK MAN xD

i just fall of my chair kuz i couldnt and i CANT stop laugh

xDXDXDXDXDDDDDDDDDDDDXXXXXXXXXXXXXXXXXXXDDDDDDDDDDDDDDDDDDD

OMGOSH

DDDDDXXXXXXXXXXXXXXXXXXXXXXXDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDD LOOOOOOOOOLLLLL

THIS IS A SHIT

XDDDDDDDDDDDDDDDDDDDDXDDDDDDDDDDDDDDDDDDDDD

A BIG ONE

XDDDDDDDD

A GRAT ONE

XXXXXXDDDD

Name: Anonymous 2012-11-08 14:53

Name: Anonymous 2012-11-08 15:00

Name: Anonymous 2012-11-08 15:18

>>27
>alt.comp.old-beards
>rec.cancer.homebrew
>alt.religion.shalom
>di-di-dah-dit di-dah dah-dah-dit dah-dah-dit dah-dah-dah dah

Name: Anonymous 2012-11-08 15:54

>>29
let's go back to reddit, bro

these old farts are boring and don't like le memes XD

Name: Anonymous 2012-11-08 17:12

>>30
What really rustles my Jim%!$*%& [NO CARRIER]

Name: Anonymous 2012-11-08 17:16

>>31
%!$*%& [NO CARRIER]
Is that a Reddit ``meme''?

Name: Anonymous 2012-11-08 21:50

>>32
Its kind of like saying candleja*!&#@<"$*[VALID PERL CODE]

Name: Anonymous 2012-11-08 22:29

Ancient slashdot ``meme''^H^H^H^H^H^H^H^HANUS I'd say.

Don't change these.
Name: Email:
Entire Thread Thread List