Friday, 10 June 2016

How to Remove Duplicate Characters from String in Java

Sometimes, a choice is restricted by putting additional constraint put by the interviewer, that's why it's better to know multiple ways to solve a problem. This not only helps in understanding the problem better but also on comparative analysis.

Solution 1 - Replacing duplicate with NULL character

Our first solution is coded in the method removeDuplicates(String word), it takes a String and returns another String without duplicates. This algorithm goes through each character of String to check if it's a duplicate of an already found character. It skips the duplicate character by inserting 0, which is later used to filter those characters and update the non-duplicate characters.
The time Complexity of this solution is O(n^2), excluded to uniqueString() method, which creates a String from the character array. This method will work even if input String contains more than one duplicate characters.

This algorithm is also an example of brute force algorithm to solve a programming problem. algorithms.

Solution 2 - Using ASCII table

Our second solution is coded in removeDuplicatesFromString(String input) method. This solution assumes that given input String only contains ASCII characters. You should ask this question to your Interviewer, if he says ASCII then this solution is Ok.

This Algorithm uses an additional memory of constant size. It creates a boolean array of length 256 to accommodate all ASCII characters. Initially, all array elements are false. This array is used as a Set to check if an ASCII character is seen already or not. You iterate over character array and mark all characters which are seen by storing true in the corresponding index of ASCII array.

For example, if input String is "abcda" then ASCII['a'] = true means at index 65 true is stored, next time when you find this character 'a', you know that this character has appeared before because of true element hence it's duplicate, so you store here ZERO or NUL. Later you remove all those characters and create a unique String as output. See Unicode demystified to learn more about encoding scheme in the computer world.

How to Remove Duplicate Characters from String in Java

Java Solution - Remove duplicate characters from given String 

Here is my solution for the problem of removing repeated or duplicate characters from given String in Java programming language. If you understand the logic you can write this solution in any programming language e.g. C, C++, C#, Python or JavaScript.


public class RemoveDuplicateCharacters{

    public static void main(String args[]) {

        System.out.println("Call removeDuplicates(String word) method ....");
        String[] testdata = {"aabscs", "abcd", "aaaa", null, "", "aaabbb", "abababa"};

        for (String input : testdata) {
            System.out.printf("Input : %s  Output: %s %n", input, removeDuplicates(input));

        }

        System.out.println("Calling removeDuplicatesFromString(String str).");
        for (String input : testdata) {
            System.out.printf("Input : %s  Output: %s %n", input, removeDuplicatesFromString(input));

        }
    }

/*

* This algorithm goes through each character of String to check if its 
* a duplicate of already found character. It skip the duplicate 
* character by inserting 0, which is later used to filter those * characters and update the non-duplicate character. 
* Time Complexity of this solution is O(n^2), excluded to 
* UniqueString() method, which creates String from character array. 
* This method will work even if String contains more than one duplicate 
* character. 
*/

    public static String removeDuplicates(String word) {
            if (word == null || word.length() < 2) { return word;
        }

int pos = 1; // possible position of duplicate character 
        char[] characters = word.toCharArray();

for (int i = 1; i < word.length(); i++) { 
            int j;
            for (j = 0; j < pos; ++j) { 
                if (characters[i] == characters[j]) {
                break:
                }
            }

            if (j == pos) {
                characters[pos] = characters[i];
                ++pos;
            } else {
                characters[pos] = 0;
                ++pos;
            }
        }

        return toUniqueString(characters);
}

/* 
* This solution assumes that given input String only contains 
* ASCII characters. You should ask this question to your Interviewer, 
* if he says ASCII then this solution is Ok. This Algorithm 
* uses additional memory of constant size. 
*/

public static String removeDuplicatesFromString(String input) {

if (input == null || input.length() &lt; 2) { 
            return input;
        }

        boolean[] ASCII = new boolean[256];
        char[] characters = input.toCharArray();
        ASCII[input.charAt(0)] = true;
        int dupIndex = 1;
        for (int i = 1; i &lt; input.length(); i++) { 
            if (!ASCII[input.charAt(i)]) {
                characters[dupIndex] = characters[i];
                ++dupIndex;
                ASCII[characters[i]] = true;
            } else {
                characters[dupIndex] = 0;
                ++dupIndex;
            }
       }
        return toUniqueString(characters);
}

/* 
* Utility method to convert Character array to String, omitting 
* NUL character, ASCII value 0.
*/

    public static String toUniqueString(char[] letters) {
        StringBuilder sb = new StringBuilder(letters.length);
        for (char c : letters) {
            if (c != 0) {
                sb.append(c);
            }
        }
        return sb.toString();
    }
}

Output:

Call removeDuplicates(String word) method ....
Input : aabscs  Output: absc
Input : abcd  Output: abcd
Input : aaaa  Output: a
Input : null  Output: null
Input :   Output:
Input : aaabbb  Output: ab
Input : abababa  Output: ab

Calling removeDuplicatesFromString(String str) method ....
Input : aabscs  Output: absc
Input : abcd  Output: abcd
Input : aaaa  Output: a
Input : null  Output: null
Input :   Output:
Input : aaabbb  Output: ab
Input : abababa  Output: ab