Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

rss reader in perl

Name: Anonymous 2011-01-19 23:04

Not sure if this board is helpful with this kind of stuff, but I'm trying to teach myself perl by writing simple scripts. This is an rss reader that prints with text I want to the command prompt. It will sometimes print items that don't contain the text, it also displays items with the text at least twice.
Anything obvious?

#!/usr/bin/perl

$feed_link="http://boards.4chan.org/n/index.rss";
$feed_file="/scripts/rss/feed.txt";
@linkarray = (" ");
while(1) {
system("wget -q $feed_link -O /scripts/rss/feed.txt");

open(RSSFILE, "<", $feed_file);

while(<RSSFILE>)
{
    if(/<item>/../<\/item>/)
    {
      if(/<title>/../<\/title>/)
      {
        $whitespace = index $_, "<";
        $title_string = substr $_, $whitespace;
        $title_string =~ s/<title>//g;
        $title_string =~ s/<\/title>//g;
        chomp($title_string);
      }
      elsif(/<link>/../<\/link>/)
      {
        $whitespace = index $_, "<";
        $link_string = substr $_, $whitespace;
        $link_string =~ s/<link>//g;
        $link_string =~ s/<\/link>//g;
        chomp($link_string);
      }
      elsif(/<description>/../<\/description>/)
      {
        $whitespace = index $_, "<";
        $description_string = substr $_, $whitespace;
        $description_string =~ s/<description>//g;
        $description_string =~ s/<\/description>//g;
        chomp($description_string);
      } 
    }
  if($link_string ~~ @linkarray) { }
  else
  {

    if(($title_string =~ m/(deen|studio|a)/i) or ($description_string =~ m/(deen|studio|a)/i))
    {
      print "\n********************\n$title_string\n$link_string\n********************\n";
      push(@linkarray, $link_string);
    }
  }
}
sleep(60);
}

Name: Anonymous 2011-01-19 23:56

Parsing XML with regexen is never pretty.  Try something like this:

#!/usr/bin/perl

use 5.008001;    # 5.010 if you really care about ~~
use strict;
use LWP::UserAgent;
use XML::Twig;

our $feed_link="http://boards.4chan.org/n/index.rss";
our @links = ...;

my $parser = new XML::Twig twig_handlers => {
    item => sub {
        my ($twig, $elem) = @_;
        my $title = $elem->field('title');
        my $link = $elem->field('link');
        my $description = $elem->field('description');
        ... # do you're printing here!
        $twig->purge;
    },
};

my $ua = new LWP::UserAgent;
my $response = $ua->get($feed_link);
$parser->parse($response->content) if $response->is_success;

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List