/prog/ - Bot Spoofing

Name: Anonymous 2010-09-01 14:26

Hi,
I've been experimenting with bypassing basic website security, and I've come across a fairly simple method of spoofing as a botnet.

I configured Firefox to show as a Googlebot by creating general.useragent.override with the simple string "Googlebot/2.1 Compatible: http://www.googlebot.com/bot.html".

Great; it works! Checked to make sure, then enjoyed browsing as a Googlebot.

Now I want to try bypassing sites with robots.txt
Is there any shortcut for this that I can try while still using Googlebot? And if not, is there another crawler with a string I can link to like this one which does not respect robots.txt in the first place?

Thanks.

Name: Anonymous 2010-09-01 15:49

You're a special kind of idiot.

Name: Anonymous 2010-09-01 15:54

>>1
Back to the imageboards, ``faggot''.

Name: Anonymous 2010-09-01 15:57

Because?

Name: Anonymous 2010-09-01 16:23

What the fucking fuck?

Name: Anonymous 2010-09-01 18:28

>>3
Fuck off, ``please''.

Name: Anonymous 2010-09-01 23:26

I never thought anyone could think changing a user agent string would magically force your browser to obey an optional directive meant for a completely different class of internet user. People always manage to be dumber than I anticipate.

Name: not >>1 2010-09-02 0:01

>>7
It's actually the case. I've seen sites which deliver different CSS and Javascript, as well as different pages depending on user-agent. Sometimes the differences are not just between families, but between versions (Windows vs *nix, Firefox 2.x vs 3.x). Leaving trivial user-agents aside, faking Googlebot is actually useful, there are some sites which deliver different versions of the page for crawlers than for regular users, for example, it's not too uncommon for some sites (I won't mention names, but I'm sure some of you know of which ones I'm talking about) would put up premium content available to only registered or paying(!!!) users online for googlebot, but not for normal visitors, such things could range from fulltext of academic papers or technical "pay" forums (how stupid!). They just desire their content indexed, and searcheable, but they also want people to pay. There was something in the Google ToS against such fakery, but it seems google has sided with at least some of the bastards, at least when it comes to academic searches as getting at the cache is no longer trivial for some sites for which they have "extended" support.

The real question that remains is if further identification besides user-agent is done for such crawlers (DNS or IP checks?)?

Name: 2010-10-24 8:14

<

Name: Anonymous 2011-01-31 20:51

<-- check em dubz

Bot Spoofing

1 Name: Anonymous 2010-09-01 14:26

2 Name: Anonymous 2010-09-01 15:49

3 Name: Anonymous 2010-09-01 15:54

4 Name: Anonymous 2010-09-01 15:57

5 Name: Anonymous 2010-09-01 16:23

6 Name: Anonymous 2010-09-01 18:28

7 Name: Anonymous 2010-09-01 23:26

8 Name: not >>1 2010-09-02 0:01

9 Name: ​​​​​​​​​​ 2010-10-24 8:14

11 Name: Anonymous 2011-01-31 20:51