/prog/ - Robots.txt

Name: Anonymous 2013-06-03 14:19

Any reason respecting it?

The only reliable way to protect a page from crawl is to bury it inside of infinitely large set of randomly generated data.

Name: Anonymous 2013-06-03 14:24

If you don't want your web page to be downloaded, don't put it on the internet.

Name: Anonymous 2013-06-03 14:25

>>2
That isn't challenging solution.

Name: Anonymous 2013-06-03 14:26

>>2
If you don't want to see Game Over, don't play Super Mario Bros

Name: Anonymous 2013-06-03 16:29

>>1
robots.txt is about interacting with search engines that play nice, not "hiding" pages, dipshit. If you don't want a page found, don't mention it in there. And if you've got the bandwidth to send an infinitely large amount of random data, how much are you paying per month? (Assuming you've got a good enough entropy source to refill the pool, or that you mean "not actually random".)

So robots.txt is the retarded way to go. You could switch content on user-agent, but then again, the crawlers that don't play nice are going to lie anyway. Or, you could do what anyone with half a fucking brain does: HTTPS and WWW-Authenticate header. HTTPS is taxing to do in bulk, so crawlers don't tend to do it as much. Requiring auth (and the headers not to be snooped due to HTTPS) keeps unwanted eyes out of it, with the usual caveats.

Name: Anonymous 2013-06-03 16:32

an infinitely large amount of random data
Oy vey!

Name: Anonymous 2013-06-03 16:46

>>6
ln -s /dev/urandom /var/www/index.html

Name: Anonymous 2013-06-03 17:02

User-Agent: * Disallow: /DO%20NOT%20READ/DO%20NOT%20READ/NOT%20CP!!!

Name: Anonymous 2013-06-04 4:41

>>6
Ok, pseudo-infinitely, you fucking kike

Robots.txt

1 Name: Anonymous 2013-06-03 14:19

2 Name: Anonymous 2013-06-03 14:24

3 Name: Anonymous 2013-06-03 14:25

4 Name: Anonymous 2013-06-03 14:26

5 Name: Anonymous 2013-06-03 16:29

6 Name: Anonymous 2013-06-03 16:32

7 Name: Anonymous 2013-06-03 16:46

8 Name: Anonymous 2013-06-03 17:02

9 Name: Anonymous 2013-06-04 4:41