Hey /prog/,
Lets say I have a file with many lines,
each line consists of a very very long random number (string)
Is it possible to write a regular expression to find the longest series of repeated numbers shared by all lines
aka
120123456789948761238746102347106416239847113958719385
217490812741234567890182472103571938547019112413956138756183756183756817356817356
192346112345678978932671658917612746183724681734682173649187245
(I bolded it for readability)
tldr;
how does I finded biggest repeaded series of numbers contained in all lines? (using regex or something one-line'ish)
Name:
Anonymous2008-08-04 4:07
My NP-Complete sense is tingling.
Name:
Anonymous2008-08-04 4:14
The most straight-forward way would be to find the shortest line and then to check every possible substring of it to see if it occurs in every other line.
It's not possible to do in a single regex.
>>5
I was thinking that if an algorithm's complexity is greater than O(n) that you cannot do it with regular expressions. The caveat being that Perl style regular expressions are actually more powerful.
I could be wrong, I only doubt cause I never seen it explicitly said, but it seems obvious from the nature of FSA.
>>8
The proof is trivial: regular expression matching is known to be possible in O(n) with respect to the input length ( http://swtch.com/~rsc/regexp/regexp1.html ). Because you can't run algorithms with complexity greater than O(n) in O(n), you can't use a regular expression to perform such an algorithm.
Name:
Anonymous2008-08-04 9:34
>>1
this can't be done with a single regex unless you knew the longest repeated number in advance.
once you know that then regex isn't even required.
if you think of the regex as a DFA, NFA, or e-NFA, then its trivially seen to be O(n). of course regexs are much more powerful than those these days with all the lookahead stuff involved, so its obvious that tits gonna take a bit longer than O(n) if you are using the more powerful features of the regex, i.e. you are a regex-jedi.
Name:
Anonymous2008-08-04 15:22
(I bolded it for readability)
Name:
Anonymous2008-08-04 15:46
Here's an OFIOC slow as fuck solution to your problem:
import Control.Monad.Instances
import Data.Ord
import Monad
import List
so what happens if you let it have error,
I mean like finding a sequence that is almost the same,
but can be one character off,
How would one go about doing that?
aka
123456789
153456789
123456729
123446789
would all be acceptable
Name:
Anonymous2008-08-05 19:58
>>26
It would be exactly the same, with a slightly different match function. What the fuck, this isn't hard.
Name:
Anonymous2008-08-05 20:10
>>26
If each line's error is in a distinct position of the sequence, it is doable with some changes to >>15.