I'm trying to search and replace words in a long string in FIOC.
(pseudocode)
for i between 0 and 10:
str = str.replace(words[i], replacements[i])
However, since strings are immutable, the string is copied each time and is slow. Since the length of the words are exactly the same as the replacements, I would like to edit the original string directly. Is that possible somehow?
Name:
Anonymous2010-03-18 21:22
Use list comprehensions: words = [word for word in str]
Then just search for each word you want to remove (say `foo') and remove it from the list:
while words.count('foo') > 1:
words.remove('foo')
words.remove('foo')
When you're done, just join the list: str = ' '.join(words)
or don't use FIOC
Name:
Anonymous2010-03-18 21:22
>>1
Turn it into a char array, loop the array to replace words manually, then load it back into the original String object.
Name:
Anonymous2010-03-18 21:27
>>2
Oops. Since you want to replace the words, change the second part to: for old_word,new_word in zip(words,replacements):
i = str.index(old_word)
str[i] = new_word
also str in this case is the list of words
Name:
Anonymous2010-03-18 21:32
>>4
put a while loop in that to search for multiple occurrences of old_word too.
Name:
Anonymous2010-03-18 21:34
interface with C code and use psyco
Name:
Anonymous2010-03-18 21:36
>>2 >>3 >>4
Thanks. Will benchmark these approaches later.
Name:
Anonymous2010-03-18 22:02
>>4
One thing I should say is that
When I want to replace "aaa" with "bbb", I need it to work with "aaa123aaa" -> "bbb123bbb" as well. I don't think this code will do this at the moment.
I wrote some script like this a few months ago. It did a few MB/second, and had some 50 search/replace strings.
I've used CL-PPCRE, first I built the search queries with cl-ppcre:create-scanner which builds a compiled closure which searches for my regexp, and then mass-replaced with cl-ppcre:cl-ppcre:regex-replace-all. Performance was rather good and it solved my problem rather well.
>>9 ppcre:regex-replace-all
Peformance was rather good
No. For a fixed relatively small set of strings custom BM or similar is a much better approach, and will often be the _only_ feasible approach.
>>11
Of course I would expect better performance for a customized search and replace that does not involve the complexities involved with regexps, but in my case:
1) I did 50 search/replace requests over my data repeatedly and was pulling a few MB/sec. That's more than enough for me
2) cl-ppcre's compiled queries are rather fast, and I don't think in case of simple queries for a string, it would be that much slower.
Of course, I could have written an optimized solution in C, which did ONLY what was needed and nothing more, but why would I invest the extra time to code something like that when CL-PPCRE already offered me more performance than I needed?
>>11
Isn't a custom Boltzmann machine kind of overkill for string replacement?
Name:
Anonymous2010-03-19 5:17
>>13
It's probably possible and may in fact be very efficient.1
I was, however, referring to Boyer-Moore integrated with some kind of suffix-trie to enable a single pass.