Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-4041-8081-

Bad word censor in Java/AspectJ

Name: Anonymous 2009-01-20 5:41

Hey /prog/ can you help me out with this?

I am trying to make a filter for bad words using java/aspectj. Whenever a method has a System.out.println which includes a bad word I want it to be replaced with an equal amount of #s. For example System.out.println("You are a damn shiteater") should print out "You are a #### ####eater". All bad words are stored as strings on a list which I can access with getBadWords(). It shouldn't be case-sensitive either. I am really stuck with this one so I hope you can help.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 5:48

replace word by #*word.length?
Or just use * for all badwords.

_________________________
orbis terrarum delenda est

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 5:57

You can use arrays for (all 3,4,5,6,letters words)
and pick number of #'s based on current array.

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 5:59

I would make a new class that does the censoring before it goes to the output stream.  Then use this class for all calls to the output stream in the program.

Then I would do:
For each Bad Word
  Search the String
  If Found
    Delete from the found index to the index + bad word length
    Insert bad word length * # at the index
    Continue searching the string for the bad word
  Else
    Stop searching for that word and move to the next

Probably the simplest way to do it, unless I am overlooking something.

Name: Anonymous 2009-01-20 6:01

>>3
Probably overdoing it, seeing as Java lets you call a Length method or property (its been a while) to see how long the string is.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 6:02

>>5
length calls for all words is slower then array grouping.

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 6:06

>>6
That's true, I was assuming that the scope of the project was small and time wasn't critical, but I shouldn't have assumed that.

Name: Anonymous 2009-01-20 6:09

All word censors are bad. Period. There is no room for debate. I will show you why.

Without being overly restrictive, you can't stop the bad words from showing up. And even if you are overly restrictive and run punishing routines, most users will try to find a way to break it.

Let me provide a couple examples:

The shit on the window is ass. Okay, so you can replace those quite easily. But what about "thes hit on the window"? Remove spaces? Okay, that only reduces a fair subset of English.

What about "the_sh_it_on_the_window"? Remove all non-letter characters? Hoo boy, you're on a rocket ship to stardom now. We'll ignore that it affects a bunch of completely innocent text for a second.

You have a routine that's case-insensitive, removes all spaces, removes all non-letters, what could possibly go wrong? People will start using things like sh*t, sh#t, whatever. So you'll still have to have a dictionary the size of ... oh, I don't know, the Library of Congress? AND THEN even if you had that, they could still use newlines.

It's not worth it and you're an idiot for thinking it is. It doesn't provide a barrier by any means for those that wish to really communicate their feelings through vulgarities, but it WILL intercept innocent conversation.

That will piss your users off more than anything else, because they don't see it as being a difficult programming problem when they get owned by it. All they see is, "What I said was perfectly innocent, what the hell is wrong with this software?"

It's simple, Timmy. The idiot that coded it thought that he could turn something as fluid as language into something as mechanical as code.

Name: Anonymous 2009-01-20 6:11

OP here, thanks for the quick response and good ideas. I am going to bed for now but I will get back on it tomorrow morning.

Name: Anonymous 2009-01-20 6:13

>>8
Yah, this is more of an intro into AspectJ. It's not meant to be a heavy duty word filter.

Name: Anonymous 2009-01-20 6:13

>>8
This is why I assumed it was more of a proof of concept or mini project rather then something to be added to a larger code. 

Also, with the way it was stated in the original post, things like "associate" would turn into "###ociate".

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 6:13

>>8
Do you know about regular expressions? They can handle even these cases without any dictionaries.

_________________________
orbis terrarum delenda est

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 6:14

>>11
The proper replace is /\bass\b/ (\b=word boundary)

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 6:14

>>12
Fuuck you.

Name: Anonymous 2009-01-20 6:15

>>12
The more you catch for exceptions, though, the more you will limit real conversations. If you start catching white spaces, then you will get things like "As soon as I get there" turning into "## #oon as I get there", making conversations very difficult.

Name: Anonymous 2009-01-20 6:17

>>13
You, sir, are an assass.

Name: Anonymous 2009-01-20 6:18

>>13
The proper replace is /you/someone who is funny or interesting, because you are neither and suck the spirit of life out of everything you touch/

Name: Anonymous 2009-01-20 6:31

clbuttic

Name: Anonymous 2009-01-20 7:52

President Abraham Lincoln was buttbuttinated by an armed buttailant after a life devoted to the reform of the US consbreastution.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 8:08

>>15,18,19
this was addressed in >>13 (using arrays of word confined in regex)

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 8:17

>>19
I fucking lol'd

Name: Anonymous 2009-01-20 8:23

>>20
Look at the OP.  He wants to censor any instance, not just if they are bound by spaces/white space.  Clearly he isn't doing this for a practical application so he is probably going to choose the simplest and most reasonable solution.

Regular expressions don't make the application better or worse, they just are a different approach to implementing it.

Name: Anonymous 2009-01-20 8:31

>>20
For certain values of ``addressed'': it doesn't even filter out "shiteater" as >>1 asked. If there's a simple way to filter assbandit but not assailant, or shiteater but not shittimwood, do pray mention it.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 8:39

>>23
/\bshit(eater|eat)\b|\bshit\b/
\bass(bandit|pirate)\b|\bass\b/
_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 8:39

>>23
parse each individual word, eg. "shiteater" becomes "shit-eat-er", "assbandit" becomes "ass-bandit", and "assailant" becomes "assail-ant". after that it should be easy.
or just include things like "shiteater" and "assbandit" in your list of bad words.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 8:42

>>25
Regular expressions were created for exactly this purpose.

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 8:50

>>26

Have you ever programmed anything that actual did something useful?  No one is talking about how to actually implement it, we are saying that if you want to filter "asss" you're probably going to filter "classic" and make broken sentences.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 8:51

>>27
You will not filter classic if you use proper regular expressions.

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 8:54

>>28
Tell me how you will filter "&&asss#" and not "classic" with regex.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 8:57

>>29
/\bass(s|)\b/
_________________________
orbis terrarum delenda est

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:01

example for filtering more words:
/\bass(|s|pirate|bandit)\b/gi
 (gi=global and ignore flags)

_________________________
orbis terrarum delenda est

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 9:03

>>30
1ass

asszhole

Regex isn't solving this problem; it's the fact that as long as you let words like "classic" stay, then you are going to have to let censor bypassing stay (which basically makes the censor useless).

Name: Anonymous 2009-01-20 9:04

>>31

This just shows how much of a novice you are.  It would be useless to hardcore every possible bypass of the word "ass".

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:06

>>32
filtered:
/\bass(|s|pirate|bandit|(z|)hole)\b|\b\dass\b/gi

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 9:08

>>34
you as5hole or should I say az5h0le

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:09

>>33
If you are writing a wordfilter its very useful.
However if you are against writing wordfilters,your argument is just "censorship is bad" and not relevant at all to discussion.

_________________________
orbis terrarum delenda est

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:11

>>35
filtered by:
/\ba[s5z][zs5](|[s5]|pirate|bandit|(z|)h[o0]le)\b|\b\dass\b/gi

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 9:11

>>36
I think the whole argument is that automatic/unmoderated wordfiltering is futile and doesn't stop someone from saying a bad word, merely creates confusion or annoys people who might have accidentally used a badword in a casual sentence .

Name: Anonymous 2009-01-20 9:11

Writing a profanity filter is like planning a genocide. Even if you're not going to put it into use, you're still a sick person.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:14

>>38,39
The point is that regex applied only to censored words.

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 9:14

>>36

Give me one application where it would be practical to hardcode every possible bad word and censor bypass.  Get over it; just because you can write "Hello World" in five different languages doesn't mean you know anything about REAL programming.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:15

>>41
Chat,Forums,Instant messaging

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 9:18

>>42

Yes, hire someone to hardcore trillions of combinations.  If you weren't a tripfag I'd think I was being trolled hard.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-20 9:20

>>43
Thats the point of regular expressions it allows to define combinations which would in their expanded form take terabytes of space. Its concise syntax to express every combination which fits the expression terms.

_________________________
orbis terrarum delenda est

Name: Anonymous 2009-01-20 9:20

>>41
Oh hai, would you be interested in a GreaseMonkey script to make all of FrozenVirgin's posts invisible? It definatelly would make your stay at /prog/ more enjoyable.

http://dis.4chan.org/read/prog/1231209853/10

Name: Anonymous 2009-01-20 9:24

>>43
YABTH.

Name: Anonymous 2009-01-20 9:25

>>45
I can only promote this script. Without FaggotVoid's posts I don't have to rage everytime I browse /prog/.

Name: Anonymous 2009-01-20 15:11

>>34
Fuck.
Fuuuuuck.
FFFFFFFuck.
Fck.
Fk.
Fuk.
Fuc.
F*ck.
Fu*k.
Fu_ck.
F_uck.
Phuck.
Phuk.
Phuc. (lol azn naem)
F|_|ck.
F|__|ck.
F|_|k.
F|_|c.
Fu©k.

And so on and so forth. Okay, now give us a regex expression that works for all of these. Also, all other bad words.

Name: Anonymous 2009-01-20 15:15

>>48
You better run that fsck.

Name: Anonymous 2009-01-20 15:34

>>49
Yes, thank you. I don't normally consider it because I think it's retarded, but that would be an additional form.

Also, "fuck" has a rare distinction of not being a part of normal conversation. Ever. Even with spaces. Most other bad words are not so lucky.

Name: Anonymous 2009-01-20 21:54

OP here.
Thanks for the help. I made a working word filter which takes in a string and prints it out with the bad words changed to #'s. Any idea on how to use an aspect in aspectj to intercept the system.out.println's in the other classes so I can use the filter on them?

Name: Anonymous 2009-01-20 22:31

Don't forget all the fun you can have with right-to-left override and zero width spaces if you allow Unicode.

Name: Anonymous 2009-01-20 23:25

OP again.
Well I got it all figured out and it works great (the word filter + aspect). Thanks again for all the help.

Name: Anonymous 2009-01-21 2:06

I do believe FrozenVoid has been backed into a corner by his own stupidity. I would say "score 1 for /prog/", but as long as FrozenVoid posts as a name/tripfag, /prog/ is a big 0.

Name: Anonymous 2009-01-21 3:58

>>54
ONLY GAYS SEE HIS POSTS!

Name: Anonymous 2009-01-21 4:29

HOW IS IT EVEN REMOTELY POSSIBLE,
IN THIS DAY AND AGE
THAT PEOPLE ARE STILL SEEING FROZEN'S POSTS?
HOW IS THIS POSSIBLE?

Name: Anonymous 2009-01-21 4:43

>>57
whose posts? all i see is "Name: Anonymous"...
well, except for that idiot >>56. his posts show up as "Name: Spammer" with no text in them.

Name: Anonymous 2009-01-21 4:45

>>58
fuck. wrong blee.

Name: Anonymous 2009-01-21 4:47

>>8
This reminds me of the time when /b/ invaded this heavily censored Barbie-chat that used whitelists. Even digits and single letters were censored.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 5:00

>>48
the regexp is:
/\bf{1,7}\|{0,1}_{0,2}\|{0,1}[u\*]{0,5}_{0,1}[kc\*©]{0,1}[kc]\b|\bphu[kc]{0,1}[kc]\b/gi



_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: Anonymous 2009-01-21 5:03

>>61
I LOL'ED SO HARD

Name: Anonymous 2009-01-21 5:18

>>45,47,54
Why?
He seems to know what he's talking about.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 5:19

>>62
If you want to filter text,you'll have to use such regexps.
I admit it looks complex at first glance but its only using num_of_chars{min,max} which is trivial to understand.

_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 5:29

>>48 same as >>61 but with groups(less readable but more logical):

/\b(f|ph){1,7}(\|_{1,2}\|{0,5}|[u\*_]{0,5})[kc\*©]{0,2}\b/gi


_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 5:38

>>65 That should be :
/\b(f{1,7}|ph{1,7})(\|_{1,2}\|{0,5}|[u\*_]{0,5})[kc\*©]{0,2}\b/gi

_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 5:41

>>66 Optimized:
 {0,5} can be ommited(the 1337 u is only with one final \|)

/\b(f{1,7}|ph{1,7})(\|_{1,2}\||[u\*_]{0,5})[kc\*©]{0,2}\b/gi
_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 5:44

>>67
Further optimized(assuming final consonant is always present):
/\b(f{1,7}|ph{1,7})(\|_{1,2}\||[u\*_]{0,5})[kc\*©]{1,2}\b/gi

_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 6:05

Also,i find it ironic that alot of you hate censorship but use the script which hides my posts.

_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: Anonymous 2009-01-21 6:13

No replies, huh?

Name: Anonymous 2009-01-21 6:46

I find it funny that your regex does not match any from the simple sample he provided you (which would be one billionth of what you would actually have to test for).

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 6:50

>>71
This is a JavaScript regexp,perhaps you are using another language?

_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 6:53

In the case you are not using JavaScript,type into address bar:
javascript:alert((/\b(f{1,7}|ph{1,7})(\|_{1,2}\||[u\*_]{0,5})[kc\*©]{1,2}\b/gi).test('abc.f_uck.tst')) ;void 0
should return true(regexp matches).

_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: Anonymous 2009-01-21 7:02

>>73
javascript:alert((/\b(f{1,7}|ph{1,7})(\|_{1,2}\||[u\*_]{0,5})[kc\*©]{1,2}\b/gi).test('fuuukcc')) ;void 0

javascript:alert((/\b(f{1,7}|ph{1,7})(\|_{1,2}\||[u\*_]{0,5})[kc\*©]{1,2}\b/gi).test('fsck')) ;void 0

You lose.

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 7:09

>>74
These words are not present at >>48
which is replied by >>61

_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: Anonymous 2009-01-21 7:11

>>75
Go back to Slashdot

Name: =+=*=F=R=O=Z=E=N==V=O=I=D=*=+= !FrOzEn2BUo 2009-01-21 7:14

>>74
I can invent any words which will not trigger regexp designed for >>48 like "ifhawifuheasiufh" but this doesn't make it invalid.
It works exactly as designed.

_________________________
orbis terrarum delenda est
 http://xs135.xs.to/xs135/09042/av922.jpg

Name: Anonymous 2009-01-21 8:42

>>77
Go back to Slashdot

Name: Anonymous 2009-01-21 12:05

☣ Please try to ignore troll posts! ☣

http://userscripts.org/scripts/show/40415

Name: Anonymous 2009-01-21 15:14

>>79
☣ Please try to troll ignore posts! ☣

Name: Anonymous 2009-01-21 17:25

>>80
☣ Please try to post ignore metatrolls! ☣

Name: Anonymous 2009-01-21 17:31

>>81
☣ Please don't try anything! ☣

Name: Anonymous 2009-01-21 17:34

>>82
☣ There is no try! ☣

Name: Anonymous 2009-01-21 18:02

☣ Please don't catch anything! ☣

Name: Anonymous 2009-01-21 21:14

>>83
Please don't quote Star Wars!

Name: Anonymous 2009-01-21 22:41

>>85
☯ Please don't star war quotes! ☯

Name: Anonymous 2010-12-28 13:12

Name: Anonymous 2011-11-27 2:47

Wonderful.. I will bookmark your blog and take the feeds also…I am satisfied to find so much useful info here in the post. Thank you for sharing. Vogue beautiful and popular dancing party full dress, you are worth owning.
http://www.hermeshandbagoutlet.com
http://www.handbagsdreams.com
http://www.backpackunion.com
http://www.charmhandbags.com
http://www.pursehandbag.org

Don't change these.
Name: Email:
Entire Thread Thread List