Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

UTF-8 validator

Name: Anonymous 2012-01-28 9:20

I wrote a little UTF-8 'validator' that checks stdin for a correct UTF-8 stream, and reports any errors along the way. It is very stringent; it even reports overlong forms as an error condition, in addition to the usual unexpected byte errors and such. One problem is that it is ``SLOW AS FUCK''; it can only check about 1.5 MB/s of random bytes on my netbook. Could you, the experts of optimisation, help me, /prog/?

Note: get rid of the ``inline''s if it fails to compile. I was just being retarded there.

http://pastebin.com/e5RrL6nq

Name: Anonymous 2012-01-28 15:28

>>13

that wasn't me.

updated:


#define BUFFER_LENGTH 1024
char buffer[BUFFER_LENGTH];
for(;;) {
  int ret = read(buffer, sizeof(char), BUFFER_LENGTH, stdin);
  if(ret == 0) {
    // end of file reached, or an error occurred.
    // In either case, just break out.
    break;
  } else {
    // ret is the number of characters successfully read into buffer.
    check_buffer_for_utf8_errors(utf8_scanner_state, buffer, ret);
  }
}


mind telling us how the code would break, so that readers of the thread would actually learn something from your responses?

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List