Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

Parsing with bash

Name: Egi 2010-11-06 12:01

Any ideas on how to parse a website with bash?

Name: Anonymous 2010-11-06 12:06

Read SICP

Name: Anonymous 2010-11-06 12:07

Well, Egi, I've written many a scraper using curl, grep, and sed. I'm sure you could do that, too.

Name: Anonymous 2010-11-06 12:07

Any ideas on how to sage?

Name: Epi 2010-11-06 12:24

>>3 Thanks! Some example of using grep/sed with curl output?

Name: Anonymous 2010-11-06 12:26

>>5
1/10.

Name: Anonymous 2010-11-06 12:31

any idea on how to make an app with C?

Name: Anonymous 2010-11-06 12:31

any idea on how to make a website with PHP?

Name: Anonymous 2010-11-06 12:31

any idea on how to make an applet with Java?

Name: Anonymous 2010-11-06 12:32

any idea on how to write a script with Python?

Name: Anonymous 2010-11-06 12:33

>>5
Something like this:

wget "http://dis.4chan.org/read/prog/1215479711" -qO -|grep -Pom1 ".{3}.{2}.O."

Name: Anonymous 2010-11-06 12:36

any idea on how to district a hotdog with mineral water?

Name: Anonymous 2010-11-06 13:32

>>3
I did it too. Now I've moved to python+BeautifulSoup.

As example of grep+sed, here's my shitty script to download manga from stoptazmo


#!/bin/bash

tmp="$(tempfile)"
manga_home="http://stoptazmo.com/manga-series/$1/"
chapter_list=".$1_chapterlist"

echo "reading chapter list"
wget -q -O$tmp $manga_home

grep $tmp -e mirror | sed -e "s/^[^']*'//;s/'.*//" >"$chapter_list"
total_chapters=$(wc -l "$chapter_list" | awk '{print $1}')

if [[ $total_chapters == 0 ]]; then
    echo "can not read chapters of $1. aborting"
    exit 1
fi


i=1
while read url; do
    echo "getting $i of $total_chapters"
    wget -c  "$url"
    i=$((i+1))
done <"$chapter_list"

Name: Anonymous 2010-11-06 13:36

>>13
BeautifulSoup is kind of shitty. It's slow, bug-prone, and doesn't work nicely from one Python version to the next. Try lxml.html instead, it has a fucking awesome .cssselect() function and also offers everything BS can do.

Don't change these.
Name: Email:
Entire Thread Thread List