>>1
Here's the schema dump for 4scrape (yes, I realize I'm a faggot for using MySQL, but I prefer their FULLTEXT search over PostGRE's) --
mysql> DESC s_posts;
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| post_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| thread_id | int(10) unsigned | YES | MUL | NULL | |
| board_id | int(10) unsigned | YES | MUL | NULL | |
| img_id | int(10) unsigned | YES | MUL | NULL | |
| post_no | int(10) unsigned | YES | MUL | NULL | |
| post_name | tinytext | YES | MUL | NULL | |
| post_email | varchar(128) | YES | MUL | NULL | |
| post_trip | varchar(64) | YES | | NULL | |
| post_subject | tinytext | YES | MUL | NULL | |
| post_date | datetime | YES | MUL | NULL | |
| post_comment | text | YES | MUL | NULL | |
| post_img | varchar(128) | YES | | NULL | |
| post_origimg | varchar(128) | YES | MUL | NULL | |
| post_legacy | int(10) unsigned | YES | | NULL | |
+--------------+------------------+------+-----+---------+----------------+
14 rows in set (0.02 sec)
mysql> DESC s_images;
+----------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------------+------+-----+---------+----------------+
| img_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| img_hash | varchar(48) | YES | UNI | NULL | |
| img_path | mediumtext | YES | | NULL | |
| img_thumb | mediumtext | YES | | NULL | |
| img_legacy | int(10) unsigned | YES | | 0 | |
| img_w | int(11) | YES | MUL | NULL | |
| img_h | int(11) | YES | MUL | NULL | |
| img_aspect | float | YES | MUL | NULL | |
| img_nsfw | tinyint(3) unsigned | YES | MUL | 0 | |
| img_nsfw_by | varchar(20) | YES | | NULL | |
| img_animated | tinyint(1) | YES | MUL | 0 | |
| img_color | int(10) unsigned | YES | MUL | NULL | |
| img_hits | int(11) | YES | MUL | 0 | |
| img_hits_total | int(11) | YES | MUL | 0 | |
+----------------+---------------------+------+-----+---------+----------------+
14 rows in set (0.00 sec)
mysql> DESC s_boards;
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| board_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| board_sname | varchar(16) | YES | | NULL | |
| board_name | varchar(128) | YES | | NULL | |
| board_host | varchar(32) | YES | | NULL | |
| board_parser | varchar(128) | YES | | NULL | |
| board_sfw | tinyint(1) | YES | | NULL | |
| board_scrape | tinyint(1) | YES | | NULL | |
| board_view | tinyint(1) | YES | | NULL | |
+--------------+------------------+------+-----+---------+----------------+
8 rows in set (0.00 sec)
The original filename (post_origimg) could really stand to be a TINYTEXT (255 chars) -- there are filenames longer than 128 characters and I get warnings in my logs once a week or so about fields getting truncated.
The important thing about Unicode is that you tell the user's browser that you're using Unicode. You can throw UTF-8 data into a Latin-1 encoded database and not damage the data at all. What
will happen is that it'll interpret the UTF-8 as Latin-1 encoded data, which will affect sorting order (but do you give a shit how Chinese characters are ordered? I sure don't). Oh, and multibyte characters will take up multiple bytes, so you may not be able to fit as many of them into the fields as you think. 4scrape uses a UTF-8 character set for the MySQL database because of the later reason.
Realistically, as long as you send the
charset="UTF-8" option in the
Content-Type header, it doesn't make a shit whether your language/database supports it or not.
Any other questions?