Create a program that when given a collection of threads containing anonymous posts, calculates a number k, with k being the predicted number of unique posters, and calculates a function f, mapping posts to poster identification numbers. Candidates will be ranked by accuracy. Unfortunately, we have no way to test accuracy.
In order to make results more verifiable, choose a non anonymous forum of your choice. Scrape a large number of posts and produce a classifier that is able to predict posts made by unique posters. Candidates will be ranked by accuracy. If the algorithm learns from provided samples, the number of samples provided in the training must be small enough so good results are not just overfitting. Bonus points if the predictor is able to predict posts made on other websites and communities.
and bonus bonus points if a subsequent program is provided, that takes a sequence of posts made by a unique poster, and is able to interpret factual statements provided in the posts so that they add up to a long list of information about the poster, possibly even revealing the poster's identity.
and bonus bonus bonus points if instead of solving this challenge, you create a writing style normalizer-randomizer that makes this type of analysis useless.
Name:
Anonymous2012-08-31 2:41
We've already had this challenge and nobody could produce a reasonably accurate version.