I am trying to make a filter for bad words using java/aspectj. Whenever a method has a System.out.println which includes a bad word I want it to be replaced with an equal amount of #s. For example System.out.println("You are a damn shiteater") should print out "You are a #### ####eater". All bad words are stored as strings on a list which I can access with getBadWords(). It shouldn't be case-sensitive either. I am really stuck with this one so I hope you can help.
You can use arrays for (all 3,4,5,6,letters words)
and pick number of #'s based on current array.
_________________________
orbis terrarum delenda est
Name:
Anonymous2009-01-20 5:59
I would make a new class that does the censoring before it goes to the output stream. Then use this class for all calls to the output stream in the program.
Then I would do:
For each Bad Word
Search the String
If Found
Delete from the found index to the index + bad word length
Insert bad word length * # at the index
Continue searching the string for the bad word
Else
Stop searching for that word and move to the next
Probably the simplest way to do it, unless I am overlooking something.
Name:
Anonymous2009-01-20 6:01
>>3
Probably overdoing it, seeing as Java lets you call a Length method or property (its been a while) to see how long the string is.
>>5
length calls for all words is slower then array grouping.
_________________________
orbis terrarum delenda est
Name:
Anonymous2009-01-20 6:06
>>6
That's true, I was assuming that the scope of the project was small and time wasn't critical, but I shouldn't have assumed that.
Name:
Anonymous2009-01-20 6:09
All word censors are bad. Period. There is no room for debate. I will show you why.
Without being overly restrictive, you can't stop the bad words from showing up. And even if you are overly restrictive and run punishing routines, most users will try to find a way to break it.
Let me provide a couple examples:
The shit on the window is ass. Okay, so you can replace those quite easily. But what about "thes hit on the window"? Remove spaces? Okay, that only reduces a fair subset of English.
What about "the_sh_it_on_the_window"? Remove all non-letter characters? Hoo boy, you're on a rocket ship to stardom now. We'll ignore that it affects a bunch of completely innocent text for a second.
You have a routine that's case-insensitive, removes all spaces, removes all non-letters, what could possibly go wrong? People will start using things like sh*t, sh#t, whatever. So you'll still have to have a dictionary the size of ... oh, I don't know, the Library of Congress? AND THEN even if you had that, they could still use newlines.
It's not worth it and you're an idiot for thinking it is. It doesn't provide a barrier by any means for those that wish to really communicate their feelings through vulgarities, but it WILL intercept innocent conversation.
That will piss your users off more than anything else, because they don't see it as being a difficult programming problem when they get owned by it. All they see is, "What I said was perfectly innocent, what the hell is wrong with this software?"
It's simple, Timmy. The idiot that coded it thought that he could turn something as fluid as language into something as mechanical as code.
Name:
Anonymous2009-01-20 6:11
OP here, thanks for the quick response and good ideas. I am going to bed for now but I will get back on it tomorrow morning.
Name:
Anonymous2009-01-20 6:13
>>8
Yah, this is more of an intro into AspectJ. It's not meant to be a heavy duty word filter.
Name:
Anonymous2009-01-20 6:13
>>8
This is why I assumed it was more of a proof of concept or mini project rather then something to be added to a larger code.
Also, with the way it was stated in the original post, things like "associate" would turn into "###ociate".
>>12
The more you catch for exceptions, though, the more you will limit real conversations. If you start catching white spaces, then you will get things like "As soon as I get there" turning into "## #oon as I get there", making conversations very difficult.
>>20
Look at the OP. He wants to censor any instance, not just if they are bound by spaces/white space. Clearly he isn't doing this for a practical application so he is probably going to choose the simplest and most reasonable solution.
Regular expressions don't make the application better or worse, they just are a different approach to implementing it.
Name:
Anonymous2009-01-20 8:31
>>20
For certain values of ``addressed'': it doesn't even filter out "shiteater" as >>1 asked. If there's a simple way to filter assbandit but not assailant, or shiteater but not shittimwood, do pray mention it.
>>23
/\bshit(eater|eat)\b|\bshit\b/
\bass(bandit|pirate)\b|\bass\b/
_________________________
orbis terrarum delenda est
Name:
Anonymous2009-01-20 8:39
>>23
parse each individual word, eg. "shiteater" becomes "shit-eat-er", "assbandit" becomes "ass-bandit", and "assailant" becomes "assail-ant". after that it should be easy.
or just include things like "shiteater" and "assbandit" in your list of bad words.
Have you ever programmed anything that actual did something useful? No one is talking about how to actually implement it, we are saying that if you want to filter "asss" you're probably going to filter "classic" and make broken sentences.
Regex isn't solving this problem; it's the fact that as long as you let words like "classic" stay, then you are going to have to let censor bypassing stay (which basically makes the censor useless).
>>33
If you are writing a wordfilter its very useful.
However if you are against writing wordfilters,your argument is just "censorship is bad" and not relevant at all to discussion.
_________________________
orbis terrarum delenda est
>>36
I think the whole argument is that automatic/unmoderated wordfiltering is futile and doesn't stop someone from saying a bad word, merely creates confusion or annoys people who might have accidentally used a badword in a casual sentence .
Name:
Anonymous2009-01-20 9:11
Writing a profanity filter is like planning a genocide. Even if you're not going to put it into use, you're still a sick person.