Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

imageboards

Name: Anonymous 2008-10-06 16:35

Hey EXPERT PROGRAMMERS
I want to make something that'll archive the imageboards I want. I know the text field is limited to 2000 characters, but what about the name/mail/subject ones? Oh, and how does it work with unicode characters?

Name: Anonymous 2008-10-07 15:31

>>23
Can somebody explain now how can I use regexes?

Sure.

    $text=~m!   <td \s id="(\d+)"[^>]*> \s*
                <input[^>]*><span \s class="replytitle">(?>(.*?)</span>) \s*
                <span \s class="commentpostername">(?:<span [^>]*>)?(?:<a \s href="mailto:([^"]*)"[^>]*>)?([^<]*?)(?:</a>)?(?:</span>)?</span>
                (?: \s* <span \s class="postertrip">(?:<span [^>]*>)?([a-zA-Z0-9\.\+/\!]+)(?:</a>)?(?:</span>)?</span>)?
                (?: \s* <span \s class="commentpostername"><span [^>]*>\#\# \s (.?)[^<]*</span></span>)?
                \s ([^>]*) \s \s* <span[^>]*> \s*
                (?>.*?</span>) \s*
                (?:
                    <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \s*
                    <span \s class="filesize">File \s :
                    <a \s href="([^"]*/src/\d+\.\w+)"[^>]*>[^<]*</a> \s*
                    \- \s* \((Spoiler \s Image,)?([\d\sGMKB\.]+)\, \s (\d+)x(\d+)(?:, \s* <span \s title="([^"]*)">[^<]*</span>)?\)
                    </span> \s*
                    (?:
                        <br>\s*<a[^>]*><img \s+ src=\S* \s+ border=\S* \s+ align=\S* \s+ (?:width=(\d+) \s height=(\d+))? [^>]*? md5="?([\w\d\=\+\/]+)"? [^>]*? ></a> \s*
                        |
                        <a[^>]*><span \s class="tn_reply"[^>]*>Thumbnail \s unavailable</span></a>
                    )
                    |
                    <br> \s*
                    <img [^>]* alt="File \s deleted\." [^>]* > \s*
                )?
                <blockquote>(?>(.*?)</blockquote>)</td></tr></table>
    !xs or $self->troubles("error parsing post\n------\n$text\n------\n") and return;


And that's how you parse a post using the power of [o][u][b]regexps[b][u][o].

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List