I'm trying to use PHP to let users enter URLs of a specific domain and return data about the content of that page, but the domain blocks programs that try to "download pages from this site automatically." Any ideas on how I might circumvent this?
Name:
Anonymous2013-12-13 16:48
CIRCUMVENT MY ANUS
Name:
Anonymous2013-12-13 16:56
Easy.
Fake Googlebot's user agent. Webmasters of shitty websites don't block it.
OP use cURL binding in your programming language. It lets you set up a HTTP client that behaves almost like a real browser (with cookies etc) except without Javascript support.
>>8
And don't forget to switch the bloody user-agent!
Name:
Anonymous2013-12-14 7:11
User-Agent: HAXMYANUS
Name:
Anonymous2013-12-14 8:38
The data I'm trying to use is numerical, and it looks like Googlebot's user agent overlooks it because it's not words. Any other ideas?
I don't know what you did. I was just advicing you to write a simple scraper and change the user agent in the requests to "Googlebot". Something like: curl_setopt($ch, CURLOPT_USERAGENT, "Googlebot/2.1 (+http://www.googlebot.com/bot.html)");