Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Bad word censor in Java/AspectJ

Name: Anonymous 2009-01-20 5:41

Hey /prog/ can you help me out with this?

I am trying to make a filter for bad words using java/aspectj. Whenever a method has a System.out.println which includes a bad word I want it to be replaced with an equal amount of #s. For example System.out.println("You are a damn shiteater") should print out "You are a #### ####eater". All bad words are stored as strings on a list which I can access with getBadWords(). It shouldn't be case-sensitive either. I am really stuck with this one so I hope you can help.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 5:48

replace word by #*word.length?
Or just use * for all badwords.

_________________________
orbis terrarum delenda est

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 5:57

You can use arrays for (all 3,4,5,6,letters words)
and pick number of #'s based on current array.

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 5:59

I would make a new class that does the censoring before it goes to the output stream.  Then use this class for all calls to the output stream in the program.

Then I would do:
For each Bad Word
  Search the String
  If Found
    Delete from the found index to the index + bad word length
    Insert bad word length * # at the index
    Continue searching the string for the bad word
  Else
    Stop searching for that word and move to the next

Probably the simplest way to do it, unless I am overlooking something.

Name: Anonymous 2009-01-20 6:01

>>3
Probably overdoing it, seeing as Java lets you call a Length method or property (its been a while) to see how long the string is.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 6:02

>>5
length calls for all words is slower then array grouping.

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 6:06

>>6
That's true, I was assuming that the scope of the project was small and time wasn't critical, but I shouldn't have assumed that.

Name: Anonymous 2009-01-20 6:09

All word censors are bad. Period. There is no room for debate. I will show you why.

Without being overly restrictive, you can't stop the bad words from showing up. And even if you are overly restrictive and run punishing routines, most users will try to find a way to break it.

Let me provide a couple examples:

The shit on the window is ass. Okay, so you can replace those quite easily. But what about "thes hit on the window"? Remove spaces? Okay, that only reduces a fair subset of English.

What about "the_sh_it_on_the_window"? Remove all non-letter characters? Hoo boy, you're on a rocket ship to stardom now. We'll ignore that it affects a bunch of completely innocent text for a second.

You have a routine that's case-insensitive, removes all spaces, removes all non-letters, what could possibly go wrong? People will start using things like sh*t, sh#t, whatever. So you'll still have to have a dictionary the size of ... oh, I don't know, the Library of Congress? AND THEN even if you had that, they could still use newlines.

It's not worth it and you're an idiot for thinking it is. It doesn't provide a barrier by any means for those that wish to really communicate their feelings through vulgarities, but it WILL intercept innocent conversation.

That will piss your users off more than anything else, because they don't see it as being a difficult programming problem when they get owned by it. All they see is, "What I said was perfectly innocent, what the hell is wrong with this software?"

It's simple, Timmy. The idiot that coded it thought that he could turn something as fluid as language into something as mechanical as code.

Name: Anonymous 2009-01-20 6:11

OP here, thanks for the quick response and good ideas. I am going to bed for now but I will get back on it tomorrow morning.

Name: Anonymous 2009-01-20 6:13

>>8
Yah, this is more of an intro into AspectJ. It's not meant to be a heavy duty word filter.

Name: Anonymous 2009-01-20 6:13

>>8
This is why I assumed it was more of a proof of concept or mini project rather then something to be added to a larger code. 

Also, with the way it was stated in the original post, things like "associate" would turn into "###ociate".

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 6:13

>>8
Do you know about regular expressions? They can handle even these cases without any dictionaries.

_________________________
orbis terrarum delenda est

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 6:14

>>11
The proper replace is /\bass\b/ (\b=word boundary)

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 6:14

>>12
Fuuck you.

Name: Anonymous 2009-01-20 6:15

>>12
The more you catch for exceptions, though, the more you will limit real conversations. If you start catching white spaces, then you will get things like "As soon as I get there" turning into "## #oon as I get there", making conversations very difficult.

Name: Anonymous 2009-01-20 6:17

>>13
You, sir, are an assass.

Name: Anonymous 2009-01-20 6:18

>>13
The proper replace is /you/someone who is funny or interesting, because you are neither and suck the spirit of life out of everything you touch/

Name: Anonymous 2009-01-20 6:31

clbuttic

Name: Anonymous 2009-01-20 7:52

President Abraham Lincoln was buttbuttinated by an armed buttailant after a life devoted to the reform of the US consbreastution.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 8:08

>>15,18,19
this was addressed in >>13 (using arrays of word confined in regex)

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 8:17

>>19
I fucking lol'd

Name: Anonymous 2009-01-20 8:23

>>20
Look at the OP.  He wants to censor any instance, not just if they are bound by spaces/white space.  Clearly he isn't doing this for a practical application so he is probably going to choose the simplest and most reasonable solution.

Regular expressions don't make the application better or worse, they just are a different approach to implementing it.

Name: Anonymous 2009-01-20 8:31

>>20
For certain values of ``addressed'': it doesn't even filter out "shiteater" as >>1 asked. If there's a simple way to filter assbandit but not assailant, or shiteater but not shittimwood, do pray mention it.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 8:39

>>23
/\bshit(eater|eat)\b|\bshit\b/
\bass(bandit|pirate)\b|\bass\b/
_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 8:39

>>23
parse each individual word, eg. "shiteater" becomes "shit-eat-er", "assbandit" becomes "ass-bandit", and "assailant" becomes "assail-ant". after that it should be easy.
or just include things like "shiteater" and "assbandit" in your list of bad words.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 8:42

>>25
Regular expressions were created for exactly this purpose.

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 8:50

>>26

Have you ever programmed anything that actual did something useful?  No one is talking about how to actually implement it, we are saying that if you want to filter "asss" you're probably going to filter "classic" and make broken sentences.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 8:51

>>27
You will not filter classic if you use proper regular expressions.

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 8:54

>>28
Tell me how you will filter "&&asss#" and not "classic" with regex.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 8:57

>>29
/\bass(s|)\b/
_________________________
orbis terrarum delenda est

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:01

example for filtering more words:
/\bass(|s|pirate|bandit)\b/gi
 (gi=global and ignore flags)

_________________________
orbis terrarum delenda est

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 9:03

>>30
1ass

asszhole

Regex isn't solving this problem; it's the fact that as long as you let words like "classic" stay, then you are going to have to let censor bypassing stay (which basically makes the censor useless).

Name: Anonymous 2009-01-20 9:04

>>31

This just shows how much of a novice you are.  It would be useless to hardcore every possible bypass of the word "ass".

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:06

>>32
filtered:
/\bass(|s|pirate|bandit|(z|)hole)\b|\b\dass\b/gi

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 9:08

>>34
you as5hole or should I say az5h0le

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:09

>>33
If you are writing a wordfilter its very useful.
However if you are against writing wordfilters,your argument is just "censorship is bad" and not relevant at all to discussion.

_________________________
orbis terrarum delenda est

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:11

>>35
filtered by:
/\ba[s5z][zs5](|[s5]|pirate|bandit|(z|)h[o0]le)\b|\b\dass\b/gi

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 9:11

>>36
I think the whole argument is that automatic/unmoderated wordfiltering is futile and doesn't stop someone from saying a bad word, merely creates confusion or annoys people who might have accidentally used a badword in a casual sentence .

Name: Anonymous 2009-01-20 9:11

Writing a profanity filter is like planning a genocide. Even if you're not going to put it into use, you're still a sick person.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:14

>>38,39
The point is that regex applied only to censored words.

_________________________
orbis terrarum delenda est

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List