Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Authenticate before WebRequest

Name: Anonymous 2008-07-17 3:40

Okay, I know this is probably the worst place to look for programming advice, but here goes.

I'm using IronPython (in b4 forced indentation of code) to write a web scraper.

I've written many-a-web scraper in my days, but this is the first time I've attempted at scraping content that requires authentication. And not just basic authentication like accessing a shared drive or something (so it's not as easy as adding NetworkCredentials to the request object), one that is forms authentication.

I'm trying to scrape a forum that requires you to be logged in before you can read any posts. I do have an active account on said forum, and can view it in FF/IE just fine.

I've been toying around with LiveHTTPHeaders extension for FF and have been trying to get the auth cookie from the login page using POST content, but I'm stuck right now because I have to wait an hour due to too many log in attempts.

Does anyone have any direction, or code examples (any language is fine) on how to do this. Or how to bind my FF cookies to my programmatic web requests?

Thanks!
in b4 read SICP

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List