Having played with Markov-chain text generators, I came up with the idea of gathering statistics by parsing randomly selected threads from a certain board at periodic intervals, then using those accumulated word frequencies to generate text, which would then have the flavor of the board.
Your thoughts, /prog/?
Name:
Anonymous2008-02-20 14:59
FIOC version:
import re, random
_acceptable_chars = "'-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
def _learn(self, sentence):
if not len(sentence):
return
last_id = BOL_MARKER_ID
for word in sentence:
word = word.lower()
if word not in self.words:
self.words.append(word)
self.chain.append([])
self.bchain.append([])
word_id = self.words.index(word)
self.chain[last_id].append(word_id)
self.bchain[word_id].append(last_id)
last_id = word_id
self.chain[last_id].append(EOL_MARKER_ID)
def _parse(self, sentence):
return filter(lambda c: c in _acceptable_chars, sentence).split()
def generate(self, base_word=None):
if not len(self.chain[0]):
return None
try:
base_id = self.words.index(base_word.lower())
except (ValueError, AttributeError):
base_id = BOL_MARKER_ID
left = []
right = []
word_id = base_id
while word_id != BOL_MARKER_ID:
left.insert(0, word_id)
word_id = random.choice(self.bchain[word_id])
word_id = base_id
while word_id != EOL_MARKER_ID:
right.append(word_id)
word_id = random.choice(self.chain[word_id])
sentence = left + right[1:]
return ' '.join(self.words[word_id] for word_id in sentence).capitalize() + '.'
def reply(self, line):
sentences = []
words = set()
for sentence in _sentence_end.split(line):
sentence = self._parse(sentence)
sentences.append(sentence)
words.update(sentence)
words = words.intersection(self.words)
s = self.generate(words and random.choice(list(words)) or None)
for sentence in sentences:
self._learn(sentence)
return s
def main():
markov = Markov()
while True:
try:
line = raw_input('> ')
except (EOFError, KeyboardInterrupt):
print
break
line = line.strip()
if line.startswith('?'):
line = markov.generate(line[1:])
elif line:
line = markov.reply(line)
print line or markov.generate() or '...'
>>22 >>20 could trivially be rewritten with just one underscore, in raw_input().
Just move __init__() to the class definition, move main() to the toplevel, get rid of the if __name__, s/_//g, and then s/rawinput/raw_input/g.
(And I suppose you could replace the raw_input with a sys.stdin.readline, too...)
Nothing never contributes anything to a TMS (see Section 7.7) may contain
a contradiction -- this is the procedure make-cell, which creates a
propagator that identifies the given output with the cell must deliver
a complete summary of the objects in the current worldview. Given that
desideratum, tms-query tries to minimize the premises that information
is contingent on anther. amb also tries to minimize the premises of
that function as many or as few times as necessary, and is exactly
(by eqv?) that object. Note: floating point numbers are compared by
approximate numerical equality; this is written in diagram style or
expression style, like a binary p:deposit.
Name:
Anonymous2011-11-22 13:41
>>36
You are not very familiar with Python, are you?
>>40
You can make it even smaller with well known standard library features such as collections.defaultdict.
Also you apparently don't know what a regular expression is.
Name:
Anonymous2011-11-22 14:21
>>41
I'll admit, I didn't know about that one. I can't see how it'll make a significant difference, though.
I do know about regular expressions, but I fail to see how they are relevant for this program.
Name:
F r o z e n V o i d !!mJCwdV5J0Xy2A212011-11-22 16:24
As i wasn't familiar with markov chains at the time of writing it, it generated a sentence chain from each word in the sentence.
word->matches_for_word[array(rnd)]+lastword->matches_for_word[array(rnd)]... somethign like that abit more advanced