Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

imageboards

Name: Anonymous 2008-10-06 16:35

Hey EXPERT PROGRAMMERS
I want to make something that'll archive the imageboards I want. I know the text field is limited to 2000 characters, but what about the name/mail/subject ones? Oh, and how does it work with unicode characters?

Name: Anonymous 2008-10-06 21:58

>>1
Here's the schema dump for 4scrape (yes, I realize I'm a faggot for using MySQL, but I prefer their FULLTEXT search over PostGRE's) --

mysql> DESC s_posts;
+--------------+------------------+------+-----+---------+----------------+
| Field        | Type             | Null | Key | Default | Extra          |
+--------------+------------------+------+-----+---------+----------------+
| post_id      | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| thread_id    | int(10) unsigned | YES  | MUL | NULL    |                |
| board_id     | int(10) unsigned | YES  | MUL | NULL    |                |
| img_id       | int(10) unsigned | YES  | MUL | NULL    |                |
| post_no      | int(10) unsigned | YES  | MUL | NULL    |                |
| post_name    | tinytext         | YES  | MUL | NULL    |                |
| post_email   | varchar(128)     | YES  | MUL | NULL    |                |
| post_trip    | varchar(64)      | YES  |     | NULL    |                |
| post_subject | tinytext         | YES  | MUL | NULL    |                |
| post_date    | datetime         | YES  | MUL | NULL    |                |
| post_comment | text             | YES  | MUL | NULL    |                |
| post_img     | varchar(128)     | YES  |     | NULL    |                |
| post_origimg | varchar(128)     | YES  | MUL | NULL    |                |
| post_legacy  | int(10) unsigned | YES  |     | NULL    |                |
+--------------+------------------+------+-----+---------+----------------+
14 rows in set (0.02 sec)

mysql> DESC s_images;
+----------------+---------------------+------+-----+---------+----------------+
| Field          | Type                | Null | Key | Default | Extra          |
+----------------+---------------------+------+-----+---------+----------------+
| img_id         | int(10) unsigned    | NO   | PRI | NULL    | auto_increment |
| img_hash       | varchar(48)         | YES  | UNI | NULL    |                |
| img_path       | mediumtext          | YES  |     | NULL    |                |
| img_thumb      | mediumtext          | YES  |     | NULL    |                |
| img_legacy     | int(10) unsigned    | YES  |     | 0       |                |
| img_w          | int(11)             | YES  | MUL | NULL    |                |
| img_h          | int(11)             | YES  | MUL | NULL    |                |
| img_aspect     | float               | YES  | MUL | NULL    |                |
| img_nsfw       | tinyint(3) unsigned | YES  | MUL | 0       |                |
| img_nsfw_by    | varchar(20)         | YES  |     | NULL    |                |
| img_animated   | tinyint(1)          | YES  | MUL | 0       |                |
| img_color      | int(10) unsigned    | YES  | MUL | NULL    |                |
| img_hits       | int(11)             | YES  | MUL | 0       |                |
| img_hits_total | int(11)             | YES  | MUL | 0       |                |
+----------------+---------------------+------+-----+---------+----------------+
14 rows in set (0.00 sec)

mysql> DESC s_boards;
+--------------+------------------+------+-----+---------+----------------+
| Field        | Type             | Null | Key | Default | Extra          |
+--------------+------------------+------+-----+---------+----------------+
| board_id     | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| board_sname  | varchar(16)      | YES  |     | NULL    |                |
| board_name   | varchar(128)     | YES  |     | NULL    |                |
| board_host   | varchar(32)      | YES  |     | NULL    |                |
| board_parser | varchar(128)     | YES  |     | NULL    |                |
| board_sfw    | tinyint(1)       | YES  |     | NULL    |                |
| board_scrape | tinyint(1)       | YES  |     | NULL    |                |
| board_view   | tinyint(1)       | YES  |     | NULL    |                |
+--------------+------------------+------+-----+---------+----------------+
8 rows in set (0.00 sec)


The original filename (post_origimg) could really stand to be a TINYTEXT (255 chars) -- there are filenames longer than 128 characters and I get warnings in my logs once a week or so about fields getting truncated.

The important thing about Unicode is that you tell the user's browser that you're using Unicode. You can throw UTF-8 data into a Latin-1 encoded database and not damage the data at all. What will happen is that it'll interpret the UTF-8 as Latin-1 encoded data, which will affect sorting order (but do you give a shit how Chinese characters are ordered? I sure don't). Oh, and multibyte characters will take up multiple bytes, so you may not be able to fit as many of them into the fields as you think. 4scrape uses a UTF-8 character set for the MySQL database because of the later reason.

Realistically, as long as you send the charset="UTF-8" option in the Content-Type header, it doesn't make a shit whether your language/database supports it or not.

Any other questions?

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List