简体   繁体   中英

Best solution for an anagram check?

I'm going through a permutation/anagram problem and wanted input on the most efficient means of checking. Now, I'm doing this in Java land, and as such there is a library for EVERYTHING including sorting. The first means of checking if two string are anagrams of each other is to check length, sort them in some manner, then compare each index of said string. Code below:

private boolean validAnagram(String str, String pair) {
if(str.length() != pair.length()){
    return false;
}

char[] strArr = str.toCharArray();
char[] pairArr = pair.toCharArray();


Arrays.sort(strArr);
str = new String(strArr);

Arrays.sort(pairArr);
pair = new String(pairArr);

for(int i = 0; i<str.length(); i++){
    if(str.charAt(i) != pair.charAt(i)){
        return false;
    }
}
return true;
}

Alternatively, I figured it would be easier to check based on ascii value and avoid a check on every possible character. Code below:

private boolean validAnagram(String str, String pair) {
if(str.length() != pair.length()){
    return false;
}

char[] strArr = str.toCharArray();
char[] pairArr = pair.toCharArray();



int strValue = 0;
int pairValue = 0;

for(int i =0; i < strArr.length; i++){
    strValue+= (int) strArr[i];
    pairValue+= (int) pairArr[i];
}

if(strValue != pairValue){
    return false;
}
return true;
}

So, which is a better solution? I don't know much about the sort that Arrays is giving me, however that's the more common answer when I look around the old internets. Makes me wonder if I'm missing something.

Here is a very simple implementation.

public boolean isAnagram(String strA, String strB) {
  // Cleaning the strings (remove white spaces and convert to lowercase)
  strA = strA.replaceAll("\\s+","").toLowerCase();
  strB = strB.replaceAll("\\s+","").toLowerCase();

  // Check every char of strA and removes first occurence of it in strB
  for (int i = 0; i < strA.length(); i++ ) {
    if (strB.equals("")) return false;  // strB is already empty : not an anagram
    strB = strB.replaceFirst(Pattern.quote("" + strA.charAt(i)), "");
  }

  // if strB is empty we have an anagram
  return strB.equals("");
}

And finally :

System.out.println(isAnagram("William Shakespeare", "I am a weakish speller")); // true

This is a much simpler, easy-to-read solution I was able to compile...

    static boolean isAnagram(String a, String b) {
    if (a.length() == b.length()){
        char[] arr1 = a.toLowerCase().toCharArray();
        char[] arr2 = b.toLowerCase().toCharArray();
        Arrays.sort(arr1);
        Arrays.sort(arr2);
        if (Arrays.equals(arr1, arr2)) return true;
        else return false;
    }else return false;
}

Best, Justin

There are several ways to check whether two strings are anagrams or not . Your question is , which one is better solution . Your first solution has sorting logic. Sorting has worst case complexity of (nlogn) . Your second logic is only using one loop which has complexity O(n) .

So out of this two , your second solution which is having only O(n) complexity will be a better solution than first one .

One possible solution :

 private boolean checkAnagram(String stringOne , String stringTwo){ char[] first = stringOne.toLowerCase().toCharArray(); char[] second = stringTwo.toLowerCase().toCharArray(); // if length of strings is not same if (first.length != second.length) return false; int[] counts = new int[26]; for (int i = 0; i < first.length; i++){ counts[first[i]-97]++; counts[second[i]-97]--; } for (int i = 0; i<26; i++) if (counts[i] != 0) return false; return true; }

My solution : Time Complexity = O(n)

public static boolean isAnagram(String str1, String str2) {
    if (str1.length() != str2.length()) {
        return false;
    }

    for (int i = 0; i < str1.length(); i++) {
        char ch = str1.charAt(i);

        if (str2.indexOf(ch) == -1) 
            return false;
        else
            str2 = str2.replaceFirst(String.valueOf(ch), " ");
    }

    return true;
}

Test case :

@Test
public void testIsPernutationTrue() {
    assertTrue(Anagram.isAnagram("abc", "cba"));
    assertTrue(Anagram.isAnagram("geeksforgeeks", "forgeeksgeeks"));
    assertTrue(Anagram.isAnagram("anagram", "margana"));
}

@Test
public void testIsPernutationFalse() {
    assertFalse(Anagram.isAnagram("abc", "caa"));
    assertFalse(Anagram.isAnagram("anagramm", "marganaa"));
}

The best solution depends on your objective, code size, memory footprint or least computation.

A very cool solution, less code as possible, not being the fastest O(nlog n) and pretty memory inefficient in Java 8 :

public class Anagram {
  public static void main(String[] argc) {
    String str1 = "gody";
    String str2 = "dogy";

    boolean isAnagram =
    str1.chars().mapToObj(c -> (char) c).sorted().collect(Collectors.toList())
    .equals(str2.chars().mapToObj(c -> (char) c).sorted().collect(Collectors.toList()));

    System.out.println(isAnagram);
  }
}

I tried a few solutions using Sets, and made each one run 10 million times to test using your example array of:

private static String[] input = {"tea", "ate", "eat", "apple", "java", "vaja", "cut", "utc"};

Firstly, the method i used to call these algotirhms:

public static void main(String[] args) {
    long startTime = System.currentTimeMillis();
    for (int x = 0; x < 10000000; x++) {
        Set<String> confirmedAnagrams = new HashSet<>();
        for (int i = 0; i < (input.length / 2) + 1; i++) {
            if (!confirmedAnagrams.contains(input[i])) {
                for (int j = i + 1; j < input.length; j++) {
                        if (isAnagrams1(input[i], input[j])) {
                            confirmedAnagrams.add(input[i]);
                            confirmedAnagrams.add(input[j]);
                        }
                }
            }
        }
        output = confirmedAnagrams.toArray(new String[confirmedAnagrams.size()]);
    }
    long endTime = System.currentTimeMillis();
    System.out.println("Total time: " + (endTime - startTime));
    System.out.println("Average time: " + ((endTime - startTime) / 10000000D));
}

I then used algorithms based on a HashSet of characters. I add each character of each word to the HashSet, and should the HashSet not be the length of the initials words, it would mean they are not anagrams.

My algorithms and their runtimes:

Algorithm 1:

    private static boolean isAnagrams1(String x, String y) {
    if (x.length() != y.length()) {
        return false;
    } else if (x.equals(y)) {
        return true;
    }

    Set<Character> anagramSet = new HashSet<>();
    for (int i = 0; i < x.length(); i++) {
        anagramSet.add(x.charAt(i));
        anagramSet.add(y.charAt(i));
    }

    return anagramSet.size() != x.length();
}

This has the runtime of:

Total time: 6914
Average time: 6.914E-4

Algorithm 2

private static boolean isAnagrams2(String x, String y) {
    if (x.length() != y.length()) {
        return false;
    } else if (x.equals(y)) {
        return true;
    }

    Set<Character> anagramSet = new HashSet<>();
    char[] xAr = x.toCharArray();
    char[] yAr = y.toCharArray();
    for (int i = 0; i < xAr.length; i++) {
        anagramSet.add(xAr[i]);
        anagramSet.add(yAr[i]);
    }

    return anagramSet.size() != x.length();
}

Has the runtime of:

Total time: 8752
Average time: 8.752E-4

Algorithm 3

For this algorithm, I decided to send the Set through, therefore I only create it once for every cycle, and clear it after each test.

    private static boolean isAnagrams3(Set<Character> anagramSet, String x, String y) {
    if (x.length() != y.length()) {
        return false;
    } else if (x.equals(y)) {
        return true;
    }

    for (int i = 0; i < x.length(); i++) {
        anagramSet.add(x.charAt(i));
        anagramSet.add(y.charAt(i));
    }

    return anagramSet.size() != x.length();
}

Has the runtime of:

Total time: 8251
Average time: 8.251E-4

Algorithm 4

This algorithm is not mine, it belongs to Pratik Upacharya which answered the question as well, in order for me to compare:

    private static boolean isAnagrams4(String stringOne, String stringTwo) {
    char[] first = stringOne.toLowerCase().toCharArray();
    char[] second = stringTwo.toLowerCase().toCharArray();
    // if length of strings is not same 
    if (first.length != second.length) {
        return false;
    }
    int[] counts = new int[26];
    for (int i = 0; i < first.length; i++) {
        counts[first[i] - 97]++;
        counts[second[i] - 97]--;
    }
    for (int i = 0; i < 26; i++) {
        if (counts[i] != 0) {
            return false;
        }
    }
    return true;
}

Has the runtime of:

Total time: 5707
Average time: 5.707E-4

Of course, these runtimes do differ for every test run, and in order to do proper testing, a larger example set is needed, and maybe more iterations thereof.

*Edited, as I made a mistake in my initial method, Pratik Upacharya's algorithm does seem to be the faster one

//here best solution for an anagram
import java.util.*;

class Anagram{
public static void main(String arg[]){

Scanner sc =new Scanner(System.in);
String str1=sc.nextLine();
String str2=sc.nextLine();
int i,j;

boolean Flag=true;
i=str1.length();
j=str2.length();


if(i==j){
for(int m=0;m<i;m++){
    for(int n=0;n<i;n++){
        if(str1.charAt(m)==str2.charAt(n)){
           Flag=true;
           break;
          }
          else
          Flag=false;
    }
}
}
else{
Flag=false;
}

if(Flag)
System.out.println("String is Anagram");
else
System.out.println("String is not Anagram");
}
}

A recruiter asked me to solve this problem recently. In studying the problem I came up with a solution that solves two types of anagram issues.

issue 1: Determine if an anagram exists within a body of text.

issue 2: Determine if a formal anagram exist within a body of text. In this case the anagram must be of the same size as the text you are comparing it against. In the former case, the two texts need not be the same size.
One just needs to contain the other.

My approach was as follows:

setup phase: First create an anagram Class. This will just convert the text to a Map whose with key the character in question and the value contains the number of occurrences of the input character. I assume that at most this would require O(n) time complexity. And since this would require two maps at most, worst case complexity would be O(2n). At least my naive understanding of Asymptotic notations says that.

processing phase: All you need do is loop thru the smaller of the two Maps and look it up in the larger Map. If it does not exist or if it exists but with a different occurrence count, it fails the test to be an anagram.

Here is the loop that determines if we have an anagram or not:

    boolean looking = true;
        for (Anagram ele : smaller.values()) {
            Anagram you = larger.get(ele);
                if (you == null || you.getCount() != ele.getCount()) {
                    looking = false;
                    break;
                }
        }
        return looking;

Note that I create a ADT to contain the strings being processed. They are converted to a Map first.

Here is a snippet of the code to create the Anagram Object:

    private void init(String teststring2) {
        StringBuilder sb = new StringBuilder(teststring2);
        for (int i = 0; i &lt sb.length(); i++) {
            Anagram a = new AnagramImpl(sb.charAt(i));
            Anagram tmp = map.putIfAbsent(a, a);
            if (tmp != null) {
                tmp.updateCount();
            }
        }
    }

I came up with a solution and I am not even using any 26 char array... Check this out:

StringBuffer a = new StringBuffer();
        a.append(sc.next().toLowerCase());

        StringBuffer b = new StringBuffer();
        b.append(sc.next().toLowerCase());
        if(a.length() !=b.length())
        {
            System.out.println("NO");
            continue;
        }
        int o =0;
        for(int i =0;i<a.length();i++)
        {
            if(a.indexOf(String.valueOf(b.charAt(i)))<0)
            {
               System.out.println("NO");
               o=1;break; 

            }
        }
        if(o==0)
         System.out.println("Yes");

Consider using HashMap and Arrays.sort

    private static Map<String, String> getAnagrams(String[] data) {

    Map<String, String> anagrams = new HashMap<>();
    Map<String, String> results = new HashMap<>();

    for (int i = 0; i < data.length; i++) {

        char[] chars = data[i].toLowerCase().toCharArray();
        Arrays.sort(chars);

        String sorted = String.copyValueOf(chars);

        String item = anagrams.get(sorted);
        if (item != null) {
            anagrams.put(sorted, item + ", " + i);
            results.put(sorted, anagrams.get(sorted));
        } else {
            anagrams.put(sorted, String.valueOf(i));
        }
    }

    return results;
}

I like it as you only traverse array only once.

Solution using primitive data type.

boolean isAnagram(char input1[], char input2[]) {
    int bitFlip = 32;

    if(input2.length != input1.length){return false;}

    boolean found = false;
    for (int x = 0; x < input1.length; x++) {
        found = false;
        for (int y = 0; y < input2.length; y++) {
             if (!found && ((input1[x] | bitFlip)) ==
             ( (input2[y] | bitFlip))) {
                found = true;
                input2[y] = 0;
            }
        }
        if (!found) {
            break;
        }
    }
    return found ;
}

This approach doesn't rely on any sorting utility. What it does is it's finding the value via iteration and after it found it, it sets it to zero to avoid input with duplicate character like "pool" and "loop" which has a 2 letter "o".

It also ignores cases without relying to toLowerCase() by flipping the bit, because if the 6th bit (32 in decimal) is one, it's a small letter and capital if it's zero.

It's direct byte manipulation so it will perform better like what's used in image manipulation. Maybe the downside is the O(n^2).

This is solution is tested in hackerrank

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM