I'm learning about operations on strings, and I'm having trouble dealing with punctuation. Basically, I need to provide the count of each word in a given string. I have no problem with this, except that punctuation at the end of a word cannot be included.
For example, the string
"This test string is a test."
should yield to
this - 1
test - 2
string - 1
is - 1
a - 1
but instead I'm getting
test - 1
test. - 1
to further complicate matters, things like "..." is a word, and should be counted once. "derp.derp" is also a single word.
I have the logic, but I don't know how to execute it
if (word ends in punctuation and is preceded by non-punctuation)
word = substring(0,word.length)
this gets pretty messy and I'm getting a lot of null pointers with spaces in between words for some reason.
I'll post some of my code after this.
Name:
Anonymous2010-12-20 23:06
//returns the whole thing with all words and their corresponding counts
private String allWordCounts(){
initializeWords();
String str = "";
for(int x = 0; x < wordCount(); x++){
if (!(words[x]).equals(""))
str += words[x] + "\t\t" + countOf(words[x]) + "\n";
}
return str;
}
//chops up the original string and puts individual words into an array. this is confirmed for working.
private void initializeWords(){
int start = 0, end = trimmed.indexOf(' ',start), counter = 0;
while(start < trimmed.length()){
end = trimmed.indexOf(' ',start);
if (end == -1)
end = trimmed.length();
words[counter] = trimmed.substring(start,end);
counter++;
start = end+1;
}
}
Name:
Anonymous2010-12-20 23:09
here's where i start to have trouble
//returns the count of a word
public int countOf(String word){
int count = 0;
for(int x = 0; x < words.length; x++){
if(words[x].equalsIgnoreCase(word) || words[x].equalsIgnoreCase(stripPunctuation(word))){
count++;
words[x] = "";
}
}
return count;
}
//words[x].equalsIgnoreCase(stripPunctuation(word)) is where i think the trouble lies, as well as my stripPunctuation method
//is SUPPOSED to take away any ending punctuation, pretty sure it doesn't
private String stripPunctuation(String word){
if(word.length() > 1){
if((isPunctuation(word.charAt(word.length())) && !isPunctuation(word.charAt(word.length()-1)))){
word = word.substring(0, word.length());
}
}
return word;
}
Use [code] tags, please, and then we might be able to help.
Name:
Anonymous2010-12-21 9:53
Why don't you just strip all punctuation from the string with replaceAll and then split it around blank character groups (using the regex \\s+). Making it all one case will probably help too.
String str = "This test string is a test.";
str = str.toLowerCase();
str = str.replaceAll(".", " "); // simple example; the first parameter uses regex/real expression codes
String[] ary = str.split("\\s+");
Array ary should have the contents: [this][test][string][is][a][test]
>>8
the problem is that "te.st" would be a valid word, and it should be counted as such.
not sure if this is the most efficient solution, but it's pretty fucking awesome, since we haven't formally learned recursion in class. here's what I painstakingly came up with.
//isPunctuation() just returns if a character is punctuation or not, in case that isn't clear
//helper method removes all leading and ending punctuation of a word
private String stripPunctuation(String word){
//if there is more than one character
if(word.length() > 1){
//if the last character is punctuation, remove it
if(isPunctuation(word.charAt(word.length()-1)))
word = word.substring(0, word.length()-1);
//if the first character is punctuation, remove it
if(isPunctuation(word.charAt(0)))
word = word.substring(1,word.length());
//if there is one character, and it is punctuation, remove the word altogether
}else{
if(word.length() == 0 || isPunctuation(word.charAt(0)))
return "";
}
//if there is still punctuation, run through stripPunctuation() again, else return the word
if (isPunctuation(word.charAt(word.length()-1)) || isPunctuation(word.charAt(0)))
return stripPunctuation(word);
return word;
}