Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

html/text editor help

Name: Anonymous 2008-04-03 19:06

I have some html files made in MS Word (for example http://www.cyf-kr.edu.pl/~eomazur/ek1zaj02.html ) and to show Greek letters Word apparently uses crap like this:

<p class=MsoNormal style='text-align:justify'><b><span style='font-family:Symbol;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-char-type:symbol;mso-symbol-font-family:Symbol'><span style='mso-char-type:
symbol;mso-symbol-font-family:Symbol'>e</span></span></b>


Which looks terrible and works ony in Internet Explorer. I want to change it into this:

&epsilon;


The problem is the line breaks which occur randomly in html. I tried several editors but none had the option to search&replace ignoring line breaks. I tried regex (dot after every character) but it doesn't match the linebreaks either. Any ideas plox?

Name: Anonymous 2008-04-04 2:54

html files made in MS Word
HAHAHA, OH WOW.

KILL YOURSELF

Name: Anonymous 2008-04-04 3:36

http://en.wikipedia.org/wiki/Comparison_of_WYSIWYG_HTML_editors

UTF-8 would be better than entities, too.

Name: Anonymous 2008-04-04 3:37

.edu.pl

Name: Anonymous 2008-04-04 4:17

>>1
Linebreaks are \r\n on Windows systems, not just \n.

Name: Anonymous 2008-04-04 7:09

>>3
I didn't make them, I just want to batch convert them instead of manually converting each file.

>>5
How do I use it?

Name: Anonymous 2008-04-04 15:58

The dot (.) does not match the newline character by default. Use the /s modifier to make the dot include newline characters.

/lol.*wut/s

Will match with "lol dsf osgg wut" and also with "lol dlsd
fdsf
bh sd fh
shg
hh hfh wut".

Might depend on the regexp implementation you're using though.

Name: Anonymous 2008-04-06 9:08

sed -ie 's/\n$//' filename

what OS?

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List