Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Circumvent block on page downloading

Name: Anonymous 2013-12-13 16:39

I'm trying to use PHP to let users enter URLs of a specific domain and return data about the content of that page, but the domain blocks programs that try to "download pages from this site automatically." Any ideas on how I might circumvent this?

Name: Anonymous 2013-12-13 16:48

CIRCUMVENT MY ANUS

Name: Anonymous 2013-12-13 16:56

Easy.

Fake Googlebot's user agent. Webmasters of shitty websites don't block it.

Name: Anonymous 2013-12-13 17:05

>>3
hmmm I'll look into this. thanks

Name: Anonymous 2013-12-13 18:01

The data I'm trying to use is numerical, and it looks like Googlebot's user agent overlooks it because it's not words. Any other ideas?

Name: Anonymous 2013-12-13 18:04

install gentoo

Name: Anonymous 2013-12-13 18:14

>>6
wat. how would that help

Name: Anonymous 2013-12-14 5:07

OP use cURL binding in your programming language. It lets you set up a HTTP client that behaves almost like a real browser (with cookies etc) except without Javascript support.

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-12-14 7:05

>>8
And don't forget to switch the bloody user-agent!

Name: Anonymous 2013-12-14 7:11

User-Agent: HAXMYANUS

Name: Anonymous 2013-12-14 8:38

The data I'm trying to use is numerical, and it looks like Googlebot's user agent overlooks it because it's not words. Any other ideas?

I don't know what you did. I was just advicing you to write a simple scraper and change the user agent in the requests to "Googlebot". Something like: curl_setopt($ch, CURLOPT_USERAGENT, "Googlebot/2.1 (+http://www.googlebot.com/bot.html)");

https://en.wikipedia.org/wiki/HTTP_request
https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

Name: Anonymous 2013-12-14 8:39

*advising

Name: 2013-12-16 15:23


Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List