Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

Scripting Language with Unicode Support

Name: Anonymous 2006-06-05 20:44

I want to write a script that reorganzes my music collection directory structure. The filenames and ID3 tags contain unicode. What language has both full Unicode string support and an ID3 library with unicode support that I can use?

Name: Anonymous 2006-06-05 20:53

Haskell ^__________________^;; (The implementations (GHC) do not seem to support Unicode in any way, but support is in the standard.)

Name: Anonymous 2006-06-05 21:41

perl?

Name: Anonymous 2006-06-05 22:23

Perl has unicode

Name: Anonymous 2006-06-05 22:23

use unicode;
Everything is unicode from then on.

Name: Anonymous 2006-06-05 23:11

In my experience, Perl's support for unicode isn't all that hot; it's there, but broken in some parts. Last time I wanted to run a bunch of regular expressions on a unicode file I first chose Perl, gave up, and used Python.

Sorry guys. This isn't a plug for Python, since I dislike that language, but unicode support in Perl needs work.

Name: Anonymous 2006-06-05 23:45

>>The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope (allow UTF-EBCDIC on EBCDIC based platforms). The no utf8 pragma tells Perl to switch back to treating the source text as literal bytes in the current lexical scope.
Source: http://perldoc.perl.org/utf8.html

This essentially means that you can have actual UTF-8 characters in your source code without having to worry about typing stuff like \x{30B8}\{30A7}

Internally, perl treats all data as utf-8, however there's like a ton of unicode formats. Once you've found out your input file's unicode encoding, you can then use the Encode module to decode the unicode input into perl's internal representation, and then encode it back to the file's unicode format for writing back into the file.

As for regexp, perl's regexp engine supports what it calls polymorphic regexp. When matching against unicode data, regexp operators have character semantics, and when matching against non-unicode data, the same regexp operators have byte semantics. No change to your code is needed to make regexp do the right thing in each context.

What this means is that operators such as . do not just match a single byte, but an entire unicode character. For example, instead of matching \ in \x{E7}, it'll match the entire \x{E7} instead, for the character ç.

This is as much as what I can recall off hand. Some parts might not be totally accurate, as I hardly work with unicode files these days.

As for MP3 and ID3 tags modules, a quick search on search.cpan.org for MP3 and ID3 should point you in the right direction.

Name: Anonymous 2006-06-06 6:20

Under Linux, Python will work fine, PHP with the mbstring extension set to your locale (recommended UTF-8) will work too (add the u modifier to Perl-compatible regexps), and Perl should work too.

If you are under Win32, Perl works as far as I can remember (I believe I did something like this too); don't know about PHP; and Python should work the best, especially if you use Win32 API calls (PyWin32) like WriteConsoleW.

Name: Anonymous 2006-06-06 18:10

I'm OP. I tried Perl and Python. I've never used either before. I've used PHP extensively though.

Perl wouldn't work. I tried for an hour with much Googling but it would never read/write the unicode characters properly. Even if it could write unicode strings it had troubles with unicode filenames, globbing, etc.

In 10 minutes I had Python doing what I wanted.
I'm officially impressed with Python and will continue to use it.

Name: Anonymous 2006-06-07 20:22 (sage)

Perl is probably using the non-unicode Win32 API calls in its readdir emulation...

Name: Anonymous 2009-01-14 5:20

JScript and it builtin in windows

Name: Anonymous 2010-06-07 6:42

Hi, I can spam /prog/ too, you faggot.

Also, smoke weed everyday.

Name: Anonymous 2010-11-28 8:53

Name: Anonymous 2010-12-06 9:29

Back to /b/, ``GNAA Faggot''

Name: Sgt.Kabu硹ㄠkiman䶂 2012-05-28 19:10

Bringing /prog/ back to its people
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy

Don't change these.
Name: Email:
Entire Thread Thread List