Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

String operations in Java

Name: Anonymous 2010-12-20 23:04

I'm learning about operations on strings, and I'm having trouble dealing with punctuation. Basically, I need to provide the count of each word in a given string. I have no problem with this, except that punctuation at the end of a word cannot be included.

For example, the string
"This test string is a test."
should yield to
this - 1
test - 2
string - 1
is - 1
a - 1

but instead I'm getting

test - 1
test. - 1

to further complicate matters, things like "..." is a word, and should be counted once. "derp.derp" is also a single word.

I have the logic, but I don't know how to execute it
if (word ends in punctuation and is preceded by non-punctuation)
   word = substring(0,word.length)

this gets pretty messy and I'm getting a lot of null pointers with spaces in between words for some reason.

I'll post some of my code after this.

Name: Anonymous 2010-12-20 23:06

//returns the whole thing with all words and their corresponding counts
private String allWordCounts(){
        initializeWords();
        String str = "";
        for(int x = 0; x < wordCount(); x++){
            if (!(words[x]).equals(""))
                str += words[x] + "\t\t" + countOf(words[x]) + "\n";
        }
        return str;
    }


//chops up the original string and puts individual words into an array. this is confirmed for working.
private void initializeWords(){
        int start = 0, end = trimmed.indexOf(' ',start), counter = 0;
        while(start < trimmed.length()){
            end = trimmed.indexOf(' ',start);
            if (end == -1)
                end = trimmed.length();
            words[counter] = trimmed.substring(start,end);
            counter++;
            start = end+1;
        }
    }

Name: Anonymous 2010-12-20 23:09

here's where i start to have trouble

//returns the count of a word
public int countOf(String word){
        int count = 0;
        for(int x = 0; x < words.length; x++){

            if(words[x].equalsIgnoreCase(word) || words[x].equalsIgnoreCase(stripPunctuation(word))){
                count++;
                words[x] = "";
            }
        }
        return count;    
    }
//words[x].equalsIgnoreCase(stripPunctuation(word)) is where i think the trouble lies, as well as my stripPunctuation method

//is SUPPOSED to take away any ending punctuation, pretty sure it doesn't
private String stripPunctuation(String word){
        if(word.length() > 1){
            if((isPunctuation(word.charAt(word.length())) && !isPunctuation(word.charAt(word.length()-1)))){
                word = word.substring(0, word.length());
            }
        }
        return word;
    }

Name: Anonymous 2010-12-20 23:09

any help will be much appreciated.

Name: Anonymous 2010-12-20 23:13

Use java.util.StringTokenizer.

Name: Anonymous 2010-12-21 0:02

I don't think we would be allowed to, I think the point is about us figuring out how to solve these problems with logic and the basic String methods.

Name: Anonymous 2010-12-21 1:54

Use [code] tags, please, and then we might be able to help.

Name: Anonymous 2010-12-21 9:53

Why don't you just strip all punctuation from the string with replaceAll and then split it around blank character groups (using the regex \\s+).  Making it all one case will probably help too.

String str = "This test string is a test.";
str = str.toLowerCase();
str = str.replaceAll(".", " "); // simple example; the first parameter uses regex/real expression codes
String[] ary = str.split("\\s+");


Array ary should have the contents: [this][test][string][is][a][test]

Name: Anonymous 2010-12-21 9:53

>>5
He's
learning about operations on strings
, not
ENTERPRISE.

Name: Anonymous 2010-12-21 14:11

Back to the front of the line.

Name: Anonymous 2010-12-21 14:12

Bump

Name: Anonymous 2010-12-21 14:46

JEWS

Name: Anonymous 2010-12-21 14:48

>>12
Semites*

Name: Anonymous 2010-12-21 16:37

HAX MY JEWS

Name: Anonymous 2010-12-21 23:38

>>8
the problem is that "te.st" would be a valid word, and it should be counted as such.

not sure if this is the most efficient solution, but it's pretty fucking awesome, since we haven't formally learned recursion in class. here's what I painstakingly came up with.


//isPunctuation() just returns if a character is punctuation or not, in case that isn't clear

//helper method removes all leading and ending punctuation of a word
    private String stripPunctuation(String word){
        //if there is more than one character
        if(word.length() > 1){
            //if the last character is punctuation, remove it
            if(isPunctuation(word.charAt(word.length()-1)))
                word = word.substring(0, word.length()-1);
            //if the first character is punctuation, remove it
            if(isPunctuation(word.charAt(0)))
                word = word.substring(1,word.length());
        //if there is one character, and it is punctuation, remove the word altogether
        }else{
            if(word.length() == 0 || isPunctuation(word.charAt(0)))
                return "";
        }
        //if there is still punctuation, run through stripPunctuation() again, else return the word
        if (isPunctuation(word.charAt(word.length()-1)) || isPunctuation(word.charAt(0)))
            return stripPunctuation(word);
        return word;
    }


thanks for the help anyways.

Name: Anonymous 2010-12-22 2:34

You know what'd be useful? Regular expressions.

Name: Anonymous 2010-12-22 2:39

>>16
Regular expressions
I can't believe you've done this.

Name: Anonymous 2010-12-22 16:23

Some people, when confronted with a problem, think "How can I have more than one problem?" Now they have regular expressions.

Name: Anonymous 2010-12-22 17:40

>>18
This made my day. Thank you, >>18.

Name: Anonymous 2010-12-25 17:12

Name: Anonymous 2010-12-26 22:08

what are regular expressions?

Name: Anonymous 2010-12-26 23:17

Name: Anonymous 2010-12-26 23:23

>>21
Perl.

Name: Anonymous 2010-12-27 3:40

>>2

lol @ nested if statements.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List