Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

JS1K 2013

Name: Anonymous 2013-03-18 17:49

http://js1k.com/
Post submissions, code golfing, and pointless shitposts ITT.

Name: Anonymous 2013-03-20 1:45

>>40 i'd be inclined to think that they are both not very complicated, and not very readable... or at least 1/2

Name: Anonymous 2013-03-20 2:07

>>41
Have you read your SICP today?

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-03-20 9:04

>>10,23,24
http://clang.llvm.org/doxygen/ConvertUTF_8c_source.html

Converting UTF-16 to UTF-32 is one 53 lines function (somewhat bloated, but it's consistent with the others in this file.)
Converting UTF-8 to UTF-32 is a 58 lines function + 23 lines auxillary function + 256-bytes table + 24-bytes table.

Converting UTF-32 to UTF-16 is one 45 lines function.
Converting UTF-32 to UTF-8 is a 49 lines function + 7 bytes table.

Total code to handle UTF-16: 98 lines of functions
Total code to handle UTF-8: 130 lines of functions + 287 bytes table

Now that there's evidence, we can have no more bitching about the complexity of UTF-16.

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-03-20 9:08

Now if your text is mostly ASCII then the 50% data size outweighs the codec complexity, but if it's CJK then you get the additional complexity of UTF-8 plus 50% expansion of the data. Different encodings have their uses.

Name: Anonymous 2013-03-20 13:55

ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS
ADRIA RICHARDS FUCKING SLUT FILTHY JEWESS WHO LOVES NIGGER COCKS

Name: Anonymous 2013-03-20 14:46

>>43
That doesn't say anything at all. The 256-byte table, for instance, is quite clearly in lieu of an integer log2 operation.
You can't make any assumptions based on one single implementation. If you could, based on that logic, if I showed you a system with hardware utf8 conversion but only a Java program for utf16, you'd have to concede that utf8 was better.

IHBT.

Name: Anonymous 2013-03-20 16:24

utf8 has one huge problem: it has O(n) access time, while plain ASCII has O(1)

Name: Anonymous 2013-03-20 16:30

>>47
http://lists.gnu.org/archive/html/guile-devel/2011-03/msg00122.html
The simplest solution is to break the stringbuf into chunks, where each
chunk holds exactly CHUNK_SIZE characters, except for the last chunk
which might have fewer.  To find a character at index i, we scan forward
(i % CHUNK_SIZE) characters from the beginning of chunk[i / CHUNK_SIZE].
Ideally, CHUNK_SIZE is a power of 2, which allows us to use bitwise
operations instead of division.  The number of characters to scan is
bounded by CHUNK_SIZE, and therefore takes O(1) time.

String-set! in general requires us to resize a chunk.  Although chunks
contain a constant number of characters, that of course maps to a
variable number of bytes.  There are various tricks we can do to reduce
the number of reallocations done, but even in the worse case, the time
spent is bounded by CHUNK_SIZE, and thus O(1).

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-03-22 4:27

>>46
Even without the table the UTF-8 decode/encode requires more code. It's fundamentally more complex as there are more cases.
You can't make any assumptions based on one single implementation.
This is not an assumption. It is a comparison of a UTF-8 and UTF-16 decode/encode implementation in one project, so the two will have been written in the same style and with the same standards.

if I showed you a system with hardware utf8 conversion but only a Java program for utf16, you'd have to concede that utf8 was better.
Wrong. UTF-8 is better for some applications, UTF-16 for others. UTF-8 evangelism is stupid. That's all.

Name: Anonymous 2013-03-22 9:16

Why use any of the UTF when there's SJIS which is far more efficient for encoding Japanese web pages?

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List