Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Parsing with bash

Name: Egi 2010-11-06 12:01

Any ideas on how to parse a website with bash?

Name: Anonymous 2010-11-06 13:32

>>3
I did it too. Now I've moved to python+BeautifulSoup.

As example of grep+sed, here's my shitty script to download manga from stoptazmo


#!/bin/bash

tmp="$(tempfile)"
manga_home="http://stoptazmo.com/manga-series/$1/"
chapter_list=".$1_chapterlist"

echo "reading chapter list"
wget -q -O$tmp $manga_home

grep $tmp -e mirror | sed -e "s/^[^']*'//;s/'.*//" >"$chapter_list"
total_chapters=$(wc -l "$chapter_list" | awk '{print $1}')

if [[ $total_chapters == 0 ]]; then
    echo "can not read chapters of $1. aborting"
    exit 1
fi


i=1
while read url; do
    echo "getting $i of $total_chapters"
    wget -c  "$url"
    i=$((i+1))
done <"$chapter_list"

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List