Wednesday, 26 July 2017

3 ways to count words in Java String

You can count words in Java String by using the split() method of String. A word is nothing but a non-space character in String, which is separated by one or multiple spaces. By using regular expression to find spaces and split on them will give you an array of all words in given String, but if you have been asked to write a program to count a number of words in given String in Java without using any of String utility methods like String.split() or StringTokenizer then it's a little bit challenging for a beginner programmer. It's actually one of the common Java coding questions and I have seen it a couple of times with Java developer interviews of 2 to 4 years of experience.
The interviewer put additional constraints like split() is not allowed, you can only use basic methods like charAt(), length(), and substring() along with loop, operators, and other basic programming tools.
In this article, I'll share all three ways to solve this problem i.e. first by using String's split() method and regular expression, second by using StringTokenizer and third without using any library method like above. The third one is the most interesting and very difficult to write a complete solution handling all special characters e.g. non-printable ASCII characters. for our purpose, we assume that space character includes tab, space or new line and anything which is considered as a letter by Character.isLetter() is considered as a word.

Btw, if you are looking for more String based coding problems, you can either check here, or you can buy Cracking the Coding Interview book, which is a collection of more than 190 programming questions and solutions from tech giants like Amazon, Google, Facebook, and Microsoft. It also includes questions from service based companies like Infosys, TCS, and Cognizant.

Solution 1 - Counting word using String.split() method

In this solution, we will use the split() method of java.lang.String class to count the number of words in a given sentence. This solution uses the regular expression "\\s+" to split the String on whitespace. The split method returns an array, the length of array is your number of words in given String.

 public static int countWordsUsingSplit(String input) {
    if (input == null || input.isEmpty()) {
      return 0;
    }

    String[] words = input.split("\\s+");
    return words.length;
  }

If you are new to regular expression in Java, the \s is a character class to detect space including tabs, since \ needs to be escaped in Java, it becomes \\s and because there could be multiple spaces between words we made this regular expression greedy by adding +, hence \\s+ will find one more space and split the String accordingly. See Core Java Volume 1 - Fundamentals by Cay S. Horstmann to learn more about the split() method of String class. This is also the simplest way to count the number of word in a given sentence.

Solution 2 - Counting word in String using StringTokenizer

Constructs a string tokenizer for the specified string. The tokenizer uses the default delimiter set, which is " \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character. Delimiter characters themselves will not be treated as tokens

public static int countWordsUsingStringTokenizer(String sentence) {
    if (sentence == null || sentence.isEmpty()) {
      return 0;
    }
    StringTokenizer tokens = new StringTokenizer(sentence);
    return tokens.countTokens();
  }

You can see that we have not given any explicit delimiter to StringTokenizer, it uses the default set of delimiter which is enough to find any whitespace and since words are separated by whitespace, the number of tokens is actually equal to the number of words in given String.

string, java-programming, java-applications

Solution 3 - Counting word in String without using library method

Here is the code to count a number of words in a given String without using any library or utility method. This is what you may have written in C or C++. It iterates through String array and checks every character. It assume that a word start with a letter and ends with something which is not a letter. Once it encounters a non-letter it increments the counter and starts searching again from the next postion.

 public static int count(String word) {
    if (word == null || word.isEmpty()) {
      return 0;
    }

    int wordCount = 0;

    boolean isWord = false;
    int endOfLine = word.length() - 1;
    char[] characters = word.toCharArray();

    for (int i = 0; i < characters.length; i++) {

      // if the char is a letter, word = true.
      if (Character.isLetter(characters[i]) && i != endOfLine) {
        isWord = true;

        // if char isn't a letter and there have been letters before,
        // counter goes up.
      } else if (!Character.isLetter(characters[i]) && isWord) {
        wordCount++;
        isWord = false;

        // last word of String; if it doesn't end with a non letter, it
        // wouldn't count without this.
      } else if (Character.isLetter(characters[i]) && i == endOfLine) {
        wordCount++;
      }
    }

    return wordCount;
  }

If you want to practice some more of this type of question, you can also check the Cracking the Coding Interview, one of the biggest collection of Programming Questions, and Solutions from technical interviews. 

Java Program to count a number of words in String


Here is our complete Java program to count a number of words in a given String sentence. It demonstrates all three examples we have seen so far e.g. using String.split() method, using StringTokenizer and writing your own method to count the number of words without using any third party library e.g. Google Guava or Apache Commons.

import java.util.StringTokenizer;

/*
 * Java Program to count number of words in String.
 * This program solves the problem in three ways,
 * by using String.split(), StringTokenizer, and
 * without any of them by just writing own logic
 */
public class Main {

  public static void main(String[] args) {

    String[] testdata = { "", null, "One", "O", "Java and C++", "a b c",
        "YouAre,best" };

    for (String input : testdata) {
      System.out.printf(
          "Number of words in stirng '%s' using split() is : %d %n", input,
          countWordsUsingSplit(input));
      System.out.printf(
          "Number of words in stirng '%s' using StringTokenizer is : %d %n",
          input, countWordsUsingStringTokenizer(input));
      System.out.printf("Number of words in stirng '%s' is : %d %n", input,
          count(input));
    }

  }

  /**
   * Count number of words in given String using split() and regular expression
   * 
   * @param input
   * @return number of words
   */
  public static int countWordsUsingSplit(String input) {
    if (input == null || input.isEmpty()) {
      return 0;
    }

    String[] words = input.split("\\s+");
    return words.length;
  }

  /**
   * Count number of words in given String using StirngTokenizer
   * 
   * @param sentence
   * @return count of words
   */
  public static int countWordsUsingStringTokenizer(String sentence) {
    if (sentence == null || sentence.isEmpty()) {
      return 0;
    }
    StringTokenizer tokens = new StringTokenizer(sentence);
    return tokens.countTokens();
  }

  /**
   * Count number of words in given String without split() or any other utility
   * method
   * 
   * @param word
   * @return number of words separated by space
   */
  public static int count(String word) {
    if (word == null || word.isEmpty()) {
      return 0;
    }

    int wordCount = 0;

    boolean isWord = false;
    int endOfLine = word.length() - 1;
    char[] characters = word.toCharArray();

    for (int i = 0; i < characters.length; i++) {

      // if the char is a letter, word = true.
      if (Character.isLetter(characters[i]) && i != endOfLine) {
        isWord = true;

        // if char isn't a letter and there have been letters before,
        // counter goes up.
      } else if (!Character.isLetter(characters[i]) && isWord) {
        wordCount++;
        isWord = false;

        // last word of String; if it doesn't end with a non letter, it
        // wouldn't count without this.
      } else if (Character.isLetter(characters[i]) && i == endOfLine) {
        wordCount++;
      }
    }

    return wordCount;
  }

}

Output:

Number of words in string '' using split() is : 0 
Number of words in string '' using StringTokenizer is : 0 
Number of words in string '' is : 0 
Number of words in string 'null' using split() is : 0 
Number of words in string 'null' using StringTokenizer is : 0 
Number of words in string 'null' is : 0 
Number of words in string 'One' using split() is : 1 
Number of words in string 'One' using StringTokenizer is : 1 
Number of words in string 'One' is : 1 
Number of words in string 'O' using split() is : 1 
Number of words in string 'O' using StringTokenizer is : 1 
Number of words in string 'O' is : 1 
Number of words in string 'Java and C++' using split() is : 3 
Number of words in string 'Java and C++' using StringTokenizer is : 3 
Number of words in string 'Java and C++' is : 3 
Number of words in string 'a b c' using split() is : 3 
Number of words in string 'a b c' using StringTokenizer is : 3 
Number of words in string 'a b c' is : 3 
Number of words in string 'YouAre,best' using split() is : 1 
Number of words in string 'YouAre,best' using StringTokenizer is : 1 
Number of words in string 'YouAre,best' is : 2 


That's all about how to count a number of words in Java String. I have shown you three ways to solve this problem, first by using split() method and regular expression, second by using StringTokenizer class and third without using any library method to solve this problem directly e.g. split or StringTokenizer. Depending upon your need, you can use any of these methods. Interviewer usually asks you do it on the third way, so be ready for that. You can also solve more String problems given on Cracking the Code Interview book to gain more practice and confidence.