Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

Multiple string.replace()

Name: Anonymous 2010-03-18 21:06

I'm trying to search and replace words in a long string in FIOC.


(pseudocode)
for i between 0 and 10:
  str = str.replace(words[i], replacements[i])


However, since strings are immutable, the string is copied each time and is slow. Since the length of the words are exactly the same as the replacements, I would like to edit the original string directly. Is that possible somehow?

Name: Anonymous 2010-03-18 21:22

Use list comprehensions:
words = [word for word in str]

Then just search for each word you want to remove (say `foo') and remove it from the list:

while words.count('foo') > 1:
    words.remove('foo')
words.remove('foo')


When you're done, just join the list:
str = ' '.join(words)

or don't use FIOC

Name: Anonymous 2010-03-18 21:22

>>1
Turn it into a char array, loop the array to replace words manually, then load it back into the original String object.

Name: Anonymous 2010-03-18 21:27

>>2
Oops.  Since you want to replace the words, change the second part to:
for old_word,new_word in zip(words,replacements):
    i = str.index(old_word)
    str[i] = new_word

  
also str in this case is the list of words

Name: Anonymous 2010-03-18 21:32

>>4
put a while loop in that to search for multiple occurrences of old_word too.

Name: Anonymous 2010-03-18 21:34

interface with C code and use psyco

Name: Anonymous 2010-03-18 21:36

>>2
>>3
>>4
Thanks. Will benchmark these approaches later.

Name: Anonymous 2010-03-18 22:02

>>4
One thing I should say is that
When I want to replace "aaa" with "bbb", I need it to work with "aaa123aaa" -> "bbb123bbb" as well. I don't think this code will do this at the moment.

Name: Anonymous 2010-03-18 22:05

I wrote some script like this a few months ago. It did a few MB/second, and had some 50 search/replace strings.

I've used CL-PPCRE, first I built the search queries with cl-ppcre:create-scanner which builds a compiled closure which searches for my regexp, and then mass-replaced with cl-ppcre:cl-ppcre:regex-replace-all. Performance was rather good and it solved my problem rather well.

Name: Anonymous 2010-03-18 22:12

>>8
In that case I suggest using the re module.

Name: Anonymous 2010-03-19 1:02

>>9
ppcre:regex-replace-all
Peformance was rather good

No. For a fixed relatively small set of strings custom BM or similar is a much better approach, and will often be the _only_ feasible approach.

Name: Anonymous 2010-03-19 1:37

>>11
Of course I would expect better performance for a customized search and replace that does not involve the complexities involved with regexps, but in my case:
1) I did 50 search/replace requests over my data repeatedly and was pulling a few MB/sec. That's more than enough for me
2) cl-ppcre's compiled queries are rather fast, and I don't think in case of simple queries for a string, it would be that much slower.

Of course, I could have written an optimized solution in C, which did ONLY what was needed and nothing more, but why would I invest the extra time to code something like that when CL-PPCRE already offered me more performance than I needed?

Name: Anonymous 2010-03-19 4:06

>>11
Isn't a custom Boltzmann machine kind of overkill for string replacement?

Name: Anonymous 2010-03-19 5:17

>>13
It's probably possible and may in fact be very efficient.1
I was, however, referring to Boyer-Moore integrated with some kind of suffix-trie to enable a single pass.

___________________________
1. http://www.doc.gold.ac.uk/~mas02mb/sdp/download/ssn.pdf retrieved 19 March 2010.

Name: Anonymous 2010-03-19 6:25

Write a C module.

Name: Anonymous 2010-03-24 20:46


import re
def multiple_replace(text,adict):
    rx = re.compile('|'.join(map(re.escape, adict)))
    def one_xlat(match):
        return adict[match.group(0)]
    return rx.sub(one_xlat, text)

text = "ABC abcAbc"
adict = { "abc" : "def" }

print multiple_replace(text,adict)

...gives me "ABC defAbc"

I need to make it case insensitive. Where do I put re.I?

Name: Anonymous 2010-03-24 21:38

If only documentation existed. Oh wait.

Name: Anonymous 2010-03-25 20:16

Why not try a [em]real[/em] language?
$arr = array ("[i]", "[/i], "[b]", "[b/]", "[bling]", "[/blirk]")
;str_replace($arr, arary("<i>", "</i>" ,"<b>,"," </b>", '<blink>', "/blink", $str)
;

Name: Anonymous 2010-03-26 3:31

>>16
re.compile(pattern, flags)

Don't change these.
Name: Email:
Entire Thread Thread List