/prog/ - cumbersome regexp operations

Name: Anonymous 2008-10-30 5:06

i believe all programmers doubt their skill from time to time and so have i now

mostly posting because i'm bored at work, doing some regexp at home to parse some machine generated html, we all know how that goes

basically to avoid spaghetti code i'm doing it in several stages, this produces more loops but it keeps the code clean because it might be used and edited by others

so basically i get the html code, localize the specific line my required info is in because it is machine generated it all ends up on one line lol

then i split that one line up, since it's a html table i replace <\/?tr> with \n so i get each row of the table where my data is by itself

then i'm confused as to what is the best approach, at this point i split at <\/?td> and put each line into an array which i loop through and gather the info i need in a more structured hash from which i can later fetch info with predefined values used as keys

Name: Anonymous 2008-10-30 5:08

Your skill at the shift key sucks.
Also, regexbuddy. Now you have three problems.

Name: Anonymous 2008-10-30 5:40

Don't use regexes. Write a real parser.

Name: Anonymous 2008-10-30 5:43

don't parse machine-generated html, use a better format since it doesn't need to be human-readable.

Name: Anonymous 2008-10-30 6:57

there is no better format, i'm trying to create rss feeds from dynamically updated data on a certain webpage

Name: Anonymous 2008-10-30 7:10

page2rss.com, problem solved

Name: Anonymous 2008-10-30 7:32

thanks for that but the project is cancelled now anyways, i'll keep page2rss in my bookmarks

turns out the info on the webpage i was parsing was not up to date and my whole idea failed with this

Name: Anonymous 2008-10-30 13:50

Serious answer, I did this for a screenscraping project before:

Shove the HTML through Lynx to render it to a normal text file, then parse that.

Name: Anonymous 2008-10-30 14:05

Fuck you guys are awful programmers.

Unless you are writing a web browser from scratch, you should not be writing an HTML parser yourself. Google didn't even write an HTML parser from scratch for Chrome, they used WebKit. So maybe you should too.

HTML is described by a DOM. So load your HTML in to some DOM object (WebKit, IE or write a FireFox plugin to get the DOM out) and find your table rows by traversing it.

If the page is valid XHTML and you want to make it in to an RSS feed then you just apply an XSLT to it that does that, and forget the fucking rest.

Idiots, all of you.

Name: Anonymous 2008-10-30 14:06

>>8
hey that's a really nice idea, i'm gonna remember that for later

Name: Anonymous 2009-03-06 8:27

problem is with the furious stroking that he on his desktop C Despite numerous reformats his computer has been through the right way to manage and protect new elements without the user mode bullshit!

Name: Anonymous 2009-03-06 13:35

The other day and everything you write a program in the mind then let it all but tap the pad and then I beat Metal Slug 3 on.

Name: Trollbot9000 2009-07-01 7:54

Problem is with the.

Name: Anonymous 2010-12-21 8:15

Name: Anonymous 2013-01-18 22:24

/prog/ will be spammed continuously until further notice. we apologize for any inconvenience this may cause.

Name: Anonymous 2013-01-19 23:07

restoring...

Name: 2013-01-25 17:50

ᡘ靓ት靱錂⁓聢ࡱѶ掕䙉捇舦鉰桨榔暖䙉ܩ㄃昷ጥ爲晒瞄戂晱钖蜡挆挨䀗摶䆗煗瀓䘤芉萑․劇❹ᘀ͒䅣瑨⦒┓ᙰ䌄Յ䙳脄青聕蝣舤Ⅹ圶䙂ᡑ蕸㍶夒㜀ጇ牷ጸᑄᠦ猲艅䘦ᜤ猈眡゘Ň㥑㠠䢁錓圸⌙ၔ奁梔░⌅ᐩ楉យ㖈∃际栅蕡嘒ቇ卅撙蕁蒗⡗焙㝓呃芉䈲眢㌧煗栓㙑⑧䖈璁蘃熄攒化匵ᕴ煔船芐ঀ钅䘲⍧栣≒肄ᤡ蜥颅途⊀⥧芕鞒ጉဠ㙴镖襅䚈暒䠔䠙ٷ虃ㅄᄁᚆ畠ረ䜅顕睁鄸ᑰ䈣ᚙ楣㥅圩芘袆剤搗慀祀c螃琠䁸̰顤鞁硤撓堹ক礙琵皙祹䄄Ѓ⍨ᅑ舘醇䦙ݷㅉ鐃鄵ቓ⤣ݢ㝉腦啨䠙ᕣ啅ᅒ咆҉摧㔹爲吧葤炇昨螑醑䀸ጸ䖁皀䜇д甦⦃挈䊅ℳ甘ၙᑶ愩碂䝕Ȗ⢃䔧虗ᜄ㉕Ĳ╴衕ᕦ䠰備恗㐓犓〔t鄉㞙牔䢕㊈耓䁑䙂癀椉坘蜨杲啒祖㕩錶嚆愰暅萇䔑蔥邂嚔✀㙦ℳ㤂ࡥ䡑䡰蝐䘶祉ɲᝦ猈⚈荥掉ᐐ钉杲慰䡙ঀ礩܁ݗ㝈㢃✅虀鄹硙ᙰ剔յ膐瘓墂䙐䆗䒀猡圅䅇䔑芕䐑䘨ʗሦ萘Ф䕒㒈❅蕴瞑妅増ހ癵ᦈ均牨⍁梄䅷摆䥷め蚉吢ࠃə㠓⍶教⍦䆐匘䄃⎁极❂ၨ々霙晈側␐ᔁ在㞑襸鈐T噠䔠⥳荇ᕶҙ᜘钀蝔ᔤ⊗ن䐢靧Ő坄垇ᠠ䤑枓䉰ᕷ顇ѣ煕ࡑŷ的妁领〉楦܈眢皉㊉䜇䢕ࡱ艀⠦Ц煆蔩兵㖘œናᦑ頗⠆鈆စ蜁夕०虸陃ᝂ吕䜘⡡撙ᙘ醂ᐠ⅃萹霘ᖘ薕唁䜢吴م䒇茵锉具ᠴ朡̦抖⑩构栧嘥␲⠆উ䚉ᕵ3㘆䁓隆捥ࠖ袓⑖葃攰莓ផ蘵栁攄鈩㉖䐉硵硩䙲鉹挳Ţ啨朔ᤑ禁䅥䚘✧猴瘩

cumbersome regexp operations

1 Name: Anonymous 2008-10-30 5:06

2 Name: Anonymous 2008-10-30 5:08

3 Name: Anonymous 2008-10-30 5:40

4 Name: Anonymous 2008-10-30 5:43

5 Name: Anonymous 2008-10-30 6:57

6 Name: Anonymous 2008-10-30 7:10

7 Name: Anonymous 2008-10-30 7:32

8 Name: Anonymous 2008-10-30 13:50

9 Name: Anonymous 2008-10-30 14:05

10 Name: Anonymous 2008-10-30 14:06

11 Name: Anonymous 2009-03-06 8:27

12 Name: Anonymous 2009-03-06 13:35

13 Name: Trollbot9000 2009-07-01 7:54

15 Name: Anonymous 2010-12-21 8:15

16 Name: Anonymous 2013-01-18 22:24

17 Name: Anonymous 2013-01-19 23:07

18 Name: 2013-01-25 17:50