Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Text file parsing

Name: Anonymous 2005-10-11 17:08

Hi, I need help.

Though I'm looking for a specific program rather than a programming solution, I figure this is the most proper board to ask.

The situation is, I have a huge text file that goes generally like this:

----- cut -----
1 1234 - Lorem ipsum dolor sit amet
2 1235 - consectetuer adipiscing elit
3 1236 - Pellentesque pellentesque vehicula velit
4 1237 - Nunc sit amet sapien at libero euismod auctor
5 1238 - Aenean turpis
6 1239 - Ut nec ipsum
7 1240 - Lorem ipsum dolor sit amet
8 1241 - Aenean turpis
9 1242 - consectetuer adipiscing elit
10 1243 - Aenean turpis
11 1244 - Nunc sit amet sapien at libero euismod auctor
----- cut -----

Many lines are randomly repeated throughout the file. What I need to do, is to remove all duplicates, leaving only one occurence of a specific line. The numbers are there, but they are to be ignored (and they're not exactly sequential either); only text strings are to be compared.

I suppose this task could have been achieved by means of a Killer PERL One-liner Of Doom, which should be fine, as I have a Linux distro handy. But I'm rather looking for a Windows-based solution, and one that is usable to a generally programming-ignorant user. I don't mind configuring or writing a *simple* script for an all-purpose text parser, but I'd rather if it didn't required me to dive into the deepest depths of regexp syntax.

I hope you get the idea, I'm looking for an app capable of the task, that's rather easy to handle on the user side. Any help will be greatly appreciated.

Name: PHPAdvocate !MARtiNys66 2009-11-01 18:16


function parse_file($huge_text_file)
{
    preg_match_all('/([0-9]+ [0-9]+) - ([^\r\n]+)/', $huge_text_file, $matches);
    $unique_strings = array_unique($matches[2]);
    foreach($unique_strings as $key => $string)
    {
        $parsed_text .= $matches[1] . ' - ' . $string . "\n";
    }
    return $parsed_text;
}

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List