/prog/ - Project ChanText

Name: !GEJzSATORI 2008-02-17 23:04

'sup /prog/,

Having played with Markov-chain text generators, I came up with the idea of gathering statistics by parsing randomly selected threads from a certain board at periodic intervals, then using those accumulated word frequencies to generate text, which would then have the flavor of the board.

Your thoughts, /prog/?

Name: Anonymous 2008-02-20 14:59

FIOC version:
_{_{_{import re, random

_acceptable_chars = "'-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

BOL_MARKER = '>'

BOL_MARKER_ID = 0

EOL_MARKER_ID = -1

_sentence_end = re.compile('[.?!;:]+')

class Markov(object):

    def __init__(self):

      self.words = [BOL_MARKER]

      self.chain = [[]]

      self.bchain = [[]]

    def _learn(self, sentence):

      if not len(sentence):

      return

      last_id = BOL_MARKER_ID

      for word in sentence:

      word = word.lower()

      if word not in self.words:

      self.words.append(word)

      self.chain.append([])

      self.bchain.append([])

      word_id = self.words.index(word)

      self.chain[last_id].append(word_id)

      self.bchain[word_id].append(last_id)

      last_id = word_id

      self.chain[last_id].append(EOL_MARKER_ID)

    def _parse(self, sentence):

      return filter(lambda c: c in _acceptable_chars, sentence).split()

    def generate(self, base_word=None):

      if not len(self.chain[0]):

      return None

      try:

      base_id = self.words.index(base_word.lower())

      except (ValueError, AttributeError):

      base_id = BOL_MARKER_ID

      left = []

      right = []

      word_id = base_id

      while word_id != BOL_MARKER_ID:

      left.insert(0, word_id)

      word_id = random.choice(self.bchain[word_id])

      word_id = base_id

      while word_id != EOL_MARKER_ID:

      right.append(word_id)

      word_id = random.choice(self.chain[word_id])

      sentence = left + right[1:]

      return ' '.join(self.words[word_id] for word_id in sentence).capitalize() + '.'

    def reply(self, line):

      sentences = []

      words = set()

      for sentence in _sentence_end.split(line):

      sentence = self._parse(sentence)

      sentences.append(sentence)

      words.update(sentence)

      words = words.intersection(self.words)

      s = self.generate(words and random.choice(list(words)) or None)

      for sentence in sentences:

      self._learn(sentence)

      return s

def main():

    markov = Markov()

    while True:

      try:

      line = raw_input('> ')

      except (EOFError, KeyboardInterrupt):

      print

      break

      line = line.strip()

      if line.startswith('?'):

      line = markov.generate(line[1:])

      elif line:

      line = markov.reply(line)

      print line or markov.generate() or '...'

if __name__ == '__main__':

    main()}}}

Name: Anonymous 2008-02-20 15:02

Oh whoops, add a space to _acceptable_chars, lol.

Name: Anonymous 2008-02-20 15:53

>>20
________________________________FIOC__________________________________

Name: Anonymous 2008-02-20 21:32

>>22
>>20 could trivially be rewritten with just one underscore, in raw_input().
Just move __init__() to the class definition, move main() to the toplevel, get rid of the if __name__, s/_//g, and then s/rawinput/raw_input/g.

(And I suppose you could replace the raw_input with a sys.stdin.readline, too...)

Name: Anonymous 2008-02-21 5:12

>>22
This is why I hate FIOC ("""also, this""")

Name: Anonymous 2008-02-21 5:37

>>1
Anonymous of Russian Federation did something like this in /a/, using MegaHAL perl bindings, I believe.

Where the fuck is he, anyway? Did he get arrested for CP or something?

Name: Anonymous 2008-03-13 6:12

bampu pantsu~

someone seems to have implemented it in /b/ already, making it even more /b/.
http://img146.imageshack.us/img146/2666/chantextyg9.png

Name: Anonymous 2008-03-13 6:23

You need something more sophisticated, even /v/ was able to detect a bot you described.

>>25
This might surprise you, but since Comcast is banned from /a/, I'm wasting my time in /prog/.

Name: Anonymous 2008-04-26 21:01

And this thread, anons, is where Bucket began.

bampu pantsu~

Name: Bucket !!PhiVV3U2X7TT1Xm 2008-04-26 21:05

Anons are not me is cookie is a baby born will die in a world where bucket began.

Name: Anonymous 2008-04-26 21:06

bampu pantsu~
This might surprise you, but THE GAME

Name: Bucket !!PhiVV3U2X7TT1Xm 2008-04-26 21:14

Do you have me learn you to say "get me the best game on the next surprise.

Name: Anonymous 2008-04-26 21:38

get.. it.. out.. ;_;

Name: Anonymous 2008-04-26 21:49

>>32
..but, reconsidering it, it's better than hax my anus

Oh god this is sad

Name: Anonymous 2011-11-22 8:23

Has anybody ported >>20-kun code to Python3 yet?

Name: Anonymous 2011-11-22 8:57

>>34
nope, try again in 3 years

Name: Anonymous 2011-11-22 13:33

Why overcomplicate things?



import sys

from random import randint



text = sys.stdin.read().split(" ")



words = {}

prev = "","\n"

for word in text:

        if prev in words:

                words[prev].append(word)

        else:

                words[prev] = [word]

        prev = prev[1], word



word = "","\n"

for i in range(10000):

        lst = words[word]

        ran = randint(0, len(lst)-1)

        word = word[1], lst[ran]

        print word[1],

Name: Anonymous 2011-11-22 13:34

everyone writes shit like this

it's not creative

Name: Anonymous 2011-11-22 13:40

Nothing never contributes anything to a TMS (see Section 7.7) may contain
a contradiction -- this is the procedure make-cell, which creates a
propagator that identifies the given output with the cell must deliver
a complete summary of the objects in the current worldview. Given that
desideratum, tms-query tries to minimize the premises that information
is contingent on anther. amb also tries to minimize the premises of
that function as many or as few times as necessary, and is exactly
(by eqv?) that object. Note: floating point numbers are compared by
approximate numerical equality; this is written in diagram style or
expression style, like a binary p:deposit.

Name: Anonymous 2011-11-22 13:41

>>36
You are not very familiar with Python, are you?

Name: Anonymous 2011-11-22 13:56

>>39

Familiar enough to make essentially the same thing^#1 with much less code - by a factor of nearly six.

#1: Actually, I find my program generates much more amusing snippets, as opposed to incoherent rambling.

Name: Anonymous 2011-11-22 14:18

>>40
You can make it even smaller with well known standard library features such as collections.defaultdict.
Also you apparently don't know what a regular expression is.

Name: Anonymous 2011-11-22 14:21

>>41
I'll admit, I didn't know about that one. I can't see how it'll make a significant difference, though.
I do know about regular expressions, but I fail to see how they are relevant for this program.

Name: Anonymous 2011-11-22 15:18

...and thus the tdavis bot was born.

Name: F r o z e n V o i d !!mJCwdV5J0Xy2A21 2011-11-22 15:41

You can take a look at my old bot here
http://dis.4chan.org/read/prog/1245466138/53

Name: Anonymous 2011-11-22 16:18

>>44
What kind of algorithm is that?

Name: F r o z e n V o i d !!mJCwdV5J0Xy2A21 2011-11-22 16:24

As i wasn't familiar with markov chains at the time of writing it, it generated a sentence chain from each word in the sentence.
word->matches_for_word[array(rnd)]+lastword->matches_for_word[array(rnd)]... somethign like that abit more advanced

Project ChanText

1 Name: !GEJzSATORI 2008-02-17 23:04

20 Name: Anonymous 2008-02-20 14:59

21 Name: Anonymous 2008-02-20 15:02

22 Name: Anonymous 2008-02-20 15:53

23 Name: Anonymous 2008-02-20 21:32

24 Name: Anonymous 2008-02-21 5:12

25 Name: Anonymous 2008-02-21 5:37

26 Name: Anonymous 2008-03-13 6:12

27 Name: Anonymous 2008-03-13 6:23

28 Name: Anonymous 2008-04-26 21:01

29 Name: Bucket !!PhiVV3U2X7TT1Xm 2008-04-26 21:05

30 Name: Anonymous 2008-04-26 21:06

31 Name: Bucket !!PhiVV3U2X7TT1Xm 2008-04-26 21:14

32 Name: Anonymous 2008-04-26 21:38

33 Name: Anonymous 2008-04-26 21:49

34 Name: Anonymous 2011-11-22 8:23

35 Name: Anonymous 2011-11-22 8:57

36 Name: Anonymous 2011-11-22 13:33

37 Name: Anonymous 2011-11-22 13:34

38 Name: Anonymous 2011-11-22 13:40

39 Name: Anonymous 2011-11-22 13:41

40 Name: Anonymous 2011-11-22 13:56

41 Name: Anonymous 2011-11-22 14:18

42 Name: Anonymous 2011-11-22 14:21

43 Name: Anonymous 2011-11-22 15:18

44 Name: F r o z e n V o i d !!mJCwdV5J0Xy2A21 2011-11-22 15:41

45 Name: Anonymous 2011-11-22 16:18

46 Name: F r o z e n V o i d !!mJCwdV5J0Xy2A21 2011-11-22 16:24