Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

UTF-8 validator

Name: Anonymous 2012-01-28 9:20

I wrote a little UTF-8 'validator' that checks stdin for a correct UTF-8 stream, and reports any errors along the way. It is very stringent; it even reports overlong forms as an error condition, in addition to the usual unexpected byte errors and such. One problem is that it is ``SLOW AS FUCK''; it can only check about 1.5 MB/s of random bytes on my netbook. Could you, the experts of optimisation, help me, /prog/?

Note: get rid of the ``inline''s if it fails to compile. I was just being retarded there.

http://pastebin.com/e5RrL6nq

Name: Anonymous 2012-01-28 9:56

The msz function can be replaced with inline assembly using the bsr (bit-scan reverse) instruction if you're on x86(_64).

Instead of switching on the state, use an inner loop.

And there's probably a ton of bit-twiddling optimizations you could do, but I can't be bothered to think about it. (Read "Hacker's delight" for more info.)

Also, you've of course profiled this, right?

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List