Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-4041-8081-

META tags

Name: Anonymous 2009-06-06 10:00

Is declaring character encoding and such within the HTML of the page considered harmful?

Name: Anonymous 2009-06-06 10:09

Maybe.

Name: Anonymous 2009-06-06 10:13

considered harmful considered harmful

Name: Anonymous 2009-06-06 10:15

>>3 considered harmful

Name: Anonymous 2009-06-06 10:39

The W3C validator thinks it's not considered harmful and encourages it.

Name: Anonymous 2009-06-06 10:51

>>5
It's preferable to use the HTTP headers though.

Name: Anonymous 2009-06-06 10:54

It'll force to restart the parsing and reflow if the initial guess was wrong. Also while it works in practice it's an undecidable problem, you're just lucky the common character encodings work in a way that allows this faggotry to work. Typical WEB design quality. I chuckle when people ask for the "correct" way do to stuff in the HTML-and-friends cesspool. Just test in in the three of four browsers that matter or copy the code from a popular page so it's already tested for you.

Name: Anonymous 2009-06-06 12:03

>>7
What in God's name are you going on about?

Name: Anonymous 2009-06-06 12:06

>>8
What part did you not understand? The encoding is needed to read the content, and the encoding is specified as part of the content. It's a recipe for disaster.

Name: Anonymous 2009-06-06 12:14

>>9
Did you mean: the encoding is specified as part of the content, and the encoding is needed to read other parts of the content

Name: Anonymous 2009-06-06 12:38

How can you parse a page if you ignore its encoding? There might be some important Arabic or Chinese mention before the character encoding, so if you must parse the document to attempt to find the character encoding, it is still necessary to parse it again from the start once you know its encoding.  It is a colossal waste of CPU cycles, which, at the scale of the web, produces as much carbon as a country like Portugal.

Name: Anonymous 2009-06-06 12:41

>>7
>>9
is a web developer (idiot)

Name: Anonymous 2009-06-06 12:44

>>11
The syntax of a web page does not depend on its encoding, so long as it contains ASCII.  There is no need to reparse anything.

Name: Anonymous 2009-06-06 12:52

That meta tag is, if you follow proper web standards, the very first thing that needs to come right after the [/code]<html>[/code] tag.

Name: Anonymous 2009-06-06 12:59

>>14 =~ s/proper/invalid/;

Name: Anonymous 2009-06-06 13:04

>>10,13
Nice reliance on the technicality that, by chance, ASCII stuff happens to be the same across ISO-8859-1, UTF-8, Shift_JIS and Big5.

Now, try again with UTF-16LE.

It's good to know people still don't understand character encodings on this day and age.

Name: Anonymous 2009-06-06 13:11

>>16
All engineering relies on things that ``happen to occur''.  That's the way reality works.

Name: Anonymous 2009-06-06 13:15

>>17
UTF-16LE happens to occur in the real world.

Name: Anonymous 2009-06-06 13:31

>>17
All English relies on things that "should be proper." That's the way reality works.

Name: Anonymous 2009-06-06 13:43

You are imbeciles, do you even know what you're angry at?

Name: Anonymous 2009-06-06 13:46

>>21

WE HATE PYTHON! DEATH TO FIOC!!!!

Name: Anonymous 2009-06-06 13:54

>>20
I'm angry at YOU.

Name: Anonymous 2009-06-06 14:02

>>1
No. You should do this so character set and encoding information is preserved when someone saves your html file to disk.

Name: Anonymous 2009-06-06 14:14

Just wait for HTML5.

Name: Anonymous 2009-06-06 15:01

>>23
The browser can do that itself. Or it could just use xattrs.

Name: Anonymous 2009-06-06 16:39

The problem was solved by >>14. >>11 is a retard. And whoever uses any other encoding pointed out by >>16 other than UTF-8 deserves to be ASSRAPED OUTRAGEOUSLY.

Name: Anonymous 2009-06-06 17:52

>>26
What about Shift-JIS?

Name: Anonymous 2009-06-06 19:20

All UTF variants are by definition a strict superset of all other character sets - in fact, UTF goes as far as having multiple representations for many characters.

Name: Anonymous 2009-06-06 19:46

>>28
You‘re thinking of the Universal Character Set, not UTF-*.

Name: Anonymous 2009-06-06 19:50

>>27
Who cares?

>>28
...which are invalid unless they're the shortest ones.

Name: Anonymous 2009-06-06 19:51

GB 18030 > UTF

Name: Anonymous 2009-06-06 19:53

Where can i download free complete Unicode font?

Name: Anonymous 2009-06-06 20:12

>>27
There is no point to SJIS where UTF-8 can be used.

Name: Anonymous 2009-06-07 7:32

>>33
There's no point to Japanese where English can be used either.

Name: Anonymous 2009-06-07 8:37

>>34
True, but that's not the point. You don't need SJIS to render nip characters. I don't know why they insist on using SJIS. Maybe because they want to feel unique? Who knows.

Name: Anonymous 2009-06-07 9:46

>>35
It's actually the other way around. The world -should- be using SJIS for everything. If you're alphabet can't be found in SJIS, then perhaps your from a country which, tbqh, doesn't really matter much.

Name: Anonymous 2009-06-07 10:37

>>36
SJIS is PIG DISGUSTING Microsoft proprietary encoding. Please to use free and technology above EUC-JP.

Name: Anonymous 2009-06-07 11:11

>>36
No, get with the times, the world is encoded in UTF8.

Name: Anonymous 2009-06-07 13:34

>>36
SJIS fails to encode seven of the world's eight most widely spoken languages.

>>38
Actually the world is encoded in UTF-16LE and arguably transmitted in a mix of UTF-8 and local "whatever Windows 9x shipped with" encodings. When your toy OS reaches 85% market share we'll talk.

Name: Anonymous 2009-06-07 14:23

>>39
Nobody uses UTF-16 except as a pretend-seekable in-memory representation of a UTF-8 document.

Name: Anonymous 2009-06-07 17:12

>>40
Butthurt Linux fanatics: When 85% of the world's software (Win32) and 95% of the world's documents (MS Office) are nobody.

Name: Anonymous 2009-06-08 10:28

>>41
We're talking about web pages in this thread. The world wide web, is arguably, mostly encoded in UTF-8.

Kindly fuck off, this isn't about OS bullfaggotry.

Name: Anonymous 2009-06-08 10:42

>>41
``Anyone can make up statistics to prove their point. 30% of all people know that''

but yeah, what >>42 said is correct

Name: Anonymous 2009-06-08 10:45

>>41
This is what Windows Users actually believe, also back to /g/ please. /prog/ is Unix country

Name: Anonymous 2009-06-08 11:51

Which is why everything is done like its still 1982. But you wouldn't know that because there isn't any documentation.

Name: Anonymous 2009-06-08 12:17

>>44
WinNT is kinda Unix

Name: Anonymous 2009-06-08 12:24

>>16
it's invalid to use the meta tag to declare the encoding in documents where the entire document up to that point is not the same as it would be in ascii.

Name: Anonymous 2009-06-08 13:06

>>46
Anything that uses backslashes as path dividers can't be Unix or even close.

Name: Anonymous 2009-06-08 14:05

>>48
Forward slashes work just fine on NT.

Name: Anonymous 2009-06-08 14:10

>>6
You should use both: HTTP headers for when the page is transferred over HTTP, meta tags for when the pages is saved loaded from disk.

And you should be using UTF-8, always. China and Japan need to get with the fucking program. UTF-16 is only used for Windows bullshite, because NT was designed before UTF-8 existed.

Name: Anonymous 2009-06-08 14:14

>>49
Too bad Windows NT limits the characters you can put in a file name, even when NTFS will happily allow any character except '\' and U+0000. The character I'd really like to use in '?', but there's no excuse for barring any of them: those who care about getting their DOS apps to work can stick with DOS filename restrictions, while the rest of us move on, but NO!

FUCKING MICROSOFT!

Name: FrozenVoid 2009-06-08 14:17

>>51
Ah well, in such case when you have "absolute freedom" you get un-deletable directories and files(with -) and other funny junk when your "happily working" program encounters them the first time.
 

_______________________________________________
http://xs135.xs.to/xs135/09042/av922.jpg
orbis terrarum delenda est

Name: Anonymous 2009-06-08 14:18

>>51
Thinking too much of backwards compatibility is Microsoft's biggest problem. Though I suppose that's what you have to do if you don't want to lose your market share.

Name: Anonymous 2009-06-08 14:22

>>51
Also, I'd also like to be able to use ?s, for example when saving images which source I'm not sure of, but since I can't use question marks I settle with perhaps.

Name: Anonymous 2009-06-08 14:50

China and Japan need to get with the fucking program. UTF-16 is only used for Windows bullshite, because NT was designed before UTF-8 existed.
GB 18030 is better than UTF-8.

Name: Anonymous 2009-06-08 16:16

>>55
Wait, China mandates a Unicode-based encoding? I thought the whole reason China/Japan weren't on Unicode was because of Han unification. Why did China find it necessary to create yet another Unicode encoding? I've glanced over the Wikipedia article, but couldn't find a reason.

Name: Anonymous 2009-06-08 16:23

>>56
Typical chinese bullshit. They need to have their own shit or else the white man wins.

Name: Anonymous 2009-06-08 16:41

>>57
They need to have their own shit or else the white man wins.
How disconnected from reality do you suppose you are?

Name: Anonymous 2009-06-08 17:36

>>56
GB 18030 encodes a SUPERset of Unicode.

Name: Anonymous 2009-06-08 17:57

>>59
Oh, so they've made new code points for the characters they really want but Unicode combined? I guess that makes sense, then.

Name: Anonymous 2009-06-08 18:22

>>59
Also, all characters that are 1 byte in UTF-8 are 1 byte in GB 18030, and GB 18030 has more characters that are 2 bytes than UTF-8 has that are 2 or 3 bytes, and no character in GB 18030 is more than 4 bytes.

Name: Anonymous 2009-06-08 18:44

>>60
The only reason to use GB 18030 is to reduce your memory usage for Chinese text (in RAM and on disk) by one third. Can't imagine why anyone would want that, though. It's not like Chinese people can read.

Name: Anonymous 2009-06-08 18:48

The only encoding I use is Latin-1. You can all go fuck yourselfs'.

Name: Anonymous 2009-06-08 18:50

>>59
It encodes a superset of the BMP, which is 65,538 code points. The total number of code points is over 9000 more like 17 million.

Name: Anonymous 2009-06-08 18:56

>>64
This gives a total of 1,587,600 (126×10×126×10) possible 4 byte sequences, which is easily sufficient to cover Unicode's 1,111,998 (17×65536 − 2048 surrogates − 66 noncharacters) assigned and reserved code points.

Name: Anonymous 2009-06-08 19:13

Chinese and Japanese are huge wastes of space. How many code points have we ceded to their bloated ideograms?

Learn to use a reasonably sized alphabet and like it, fags.

Name: Anonymous 2009-06-08 20:02

>>66
Their literacy rates are better than a lot of countries that use 'reasonably sized alphabets' since each ideogram encodes meaning. If you compare Chinese Twitter to English Twitter, they can have tweets with real content[1]. There is something to be said for their ``alphabets''. That said, just shut up and drink the Unicode kool-aid already.

--
1. Disclaimer: This post is in no way, shape or form an endorsement of twitter(tm) or other web 2.0 faggotry

Name: Anonymous 2009-06-08 20:23

>>67
Chinese is cool for that reason, but Japanese seems to have the worst of both worlds: text is longer than English[1], but they still use thousands of characters.

1. My conclusion based on my four years of high-school Japanese.

Name: Anonymous 2009-06-08 20:26

>>42
Actually, while some behemoths use UTF-8 (mostly the big USA megacorps, also 4chan), I still see plenty of pages using local encoding, specially small sites done by non-EXPERT WEB DEVELOPERS.

Also, while I don't visit a lot of [i]weaboo(i] or chinese sites, I still have to find one that uses UTF-8 as opposed to ShiftJIS or BIG5 respectively.

Name: Anonymous 2009-06-08 20:43

>>68
日本語's verbose.

Name: Anonymous 2009-06-08 20:50

>>67
I really dislike when people say "drink the X kool-aid." Just letting you know.

Name: Anonymous 2009-06-08 21:23

>>67
There may be something to be said for the Chinese ``alphabet'', but that something is not that it has a convenient or sane digital representation.

Name: Anonymous 2009-06-08 21:26

We should introduce a moderate number of word glyphs into English. While far Eastern languages clearly take it too far, surely a few hundred glyphs for the most common words could compact our language immensely.

Name: Anonymous 2009-06-08 21:48

>>73
Just take the language as it is and use order 14 prediction by partial matching with a 512MB adaptive context model initialized with a few megabytes of carefully selected text. Use an arithmetic range coder to write the probability choices using a 4096-symbol alphabet (for example each glyph could be a braille-style 3x4 rectangle).

This should give between 10 and 20 characters per symbol for natural English text, depending on how well the context is able to predict it.

Don't expect high acceptance rates though, as it happens to be a bit human-unfriendly.

Name: Anonymous 2009-06-08 21:58

>>73
You /prog/ers already whine and bitch that people use too much symbols and not enough WORDS.

Name: Anonymous 2009-06-08 22:28

>>75
Show me where.

Name: Anonymous 2009-06-17 12:04

</html>

O SHI

Name: Anonymous 2009-06-17 12:07

>>76
(  ≖‿≖)

Name: Anonymous 2009-06-17 13:26

>>51
You can use characters which are deemed illegal in the filenames through NT-specific APIs, but don't blame me if it breaks a lot of other applications, and creates 'undeletable/unaccesible' files. There's already plenty of applications which were compiled in ANSI mode which just plain fail at reading files using Unicode filenames (they do work (partially) if you switch your system locale, or use Applocale, or some other wrapper, so that the converted filename's characters match the locale's charset).

tl;dr: Those limitations are mostly for your own good, but they're not enough to stop various breakages due to bad coding. You can bypass the limitations if you really desire to.

Name: Anonymous 2011-02-04 12:12


Don't change these.
Name: Email:
Entire Thread Thread List