简体   繁体   English

字谜检查的最佳解决方案?

[英]Best solution for an anagram check?

I'm going through a permutation/anagram problem and wanted input on the most efficient means of checking.我正在解决一个排列/字谜问题,并希望输入最有效的检查方法。 Now, I'm doing this in Java land, and as such there is a library for EVERYTHING including sorting.现在,我在 Java 领域做这件事,因此有一个包括排序在内的所有东西的库。 The first means of checking if two string are anagrams of each other is to check length, sort them in some manner, then compare each index of said string.检查两个字符串是否互为变位词的第一种方法是检查长度,以某种方式对它们进行排序,然后比较所述字符串的每个索引。 Code below:代码如下:

private boolean validAnagram(String str, String pair) {
if(str.length() != pair.length()){
    return false;
}

char[] strArr = str.toCharArray();
char[] pairArr = pair.toCharArray();


Arrays.sort(strArr);
str = new String(strArr);

Arrays.sort(pairArr);
pair = new String(pairArr);

for(int i = 0; i<str.length(); i++){
    if(str.charAt(i) != pair.charAt(i)){
        return false;
    }
}
return true;
}

Alternatively, I figured it would be easier to check based on ascii value and avoid a check on every possible character.或者,我认为根据 ascii 值进行检查并避免检查每个可能的字符会更容易。 Code below:代码如下:

private boolean validAnagram(String str, String pair) {
if(str.length() != pair.length()){
    return false;
}

char[] strArr = str.toCharArray();
char[] pairArr = pair.toCharArray();



int strValue = 0;
int pairValue = 0;

for(int i =0; i < strArr.length; i++){
    strValue+= (int) strArr[i];
    pairValue+= (int) pairArr[i];
}

if(strValue != pairValue){
    return false;
}
return true;
}

So, which is a better solution?那么,哪个是更好的解决方案? I don't know much about the sort that Arrays is giving me, however that's the more common answer when I look around the old internets.我不太了解 Arrays 给我的类型,但是当我环顾旧互联网时,这是更常见的答案。 Makes me wonder if I'm missing something.让我怀疑我是否遗漏了什么。

Here is a very simple implementation.这是一个非常简单的实现。

public boolean isAnagram(String strA, String strB) {
  // Cleaning the strings (remove white spaces and convert to lowercase)
  strA = strA.replaceAll("\\s+","").toLowerCase();
  strB = strB.replaceAll("\\s+","").toLowerCase();

  // Check every char of strA and removes first occurence of it in strB
  for (int i = 0; i < strA.length(); i++ ) {
    if (strB.equals("")) return false;  // strB is already empty : not an anagram
    strB = strB.replaceFirst(Pattern.quote("" + strA.charAt(i)), "");
  }

  // if strB is empty we have an anagram
  return strB.equals("");
}

And finally :最后:

System.out.println(isAnagram("William Shakespeare", "I am a weakish speller")); // true

This is a much simpler, easy-to-read solution I was able to compile...这是一个更简单、易于阅读的解决方案,我能够编译...

    static boolean isAnagram(String a, String b) {
    if (a.length() == b.length()){
        char[] arr1 = a.toLowerCase().toCharArray();
        char[] arr2 = b.toLowerCase().toCharArray();
        Arrays.sort(arr1);
        Arrays.sort(arr2);
        if (Arrays.equals(arr1, arr2)) return true;
        else return false;
    }else return false;
}

Best, Justin最好的,贾斯汀

There are several ways to check whether two strings are anagrams or not .有几种方法可以检查两个字符串是否为字谜。 Your question is , which one is better solution .你的问题是,哪一个是更好的解决方案。 Your first solution has sorting logic.您的第一个解决方案具有排序逻辑。 Sorting has worst case complexity of (nlogn) .排序的最坏情况复杂度为 (nlogn) 。 Your second logic is only using one loop which has complexity O(n) .您的第二个逻辑仅使用一个复杂度为 O(n) 的循环。

So out of this two , your second solution which is having only O(n) complexity will be a better solution than first one .所以在这两个中,只有 O(n) 复杂度的第二个解决方案将是比第一个更好的解决方案。

One possible solution :一种可能的解决方案:

 private boolean checkAnagram(String stringOne , String stringTwo){ char[] first = stringOne.toLowerCase().toCharArray(); char[] second = stringTwo.toLowerCase().toCharArray(); // if length of strings is not same if (first.length != second.length) return false; int[] counts = new int[26]; for (int i = 0; i < first.length; i++){ counts[first[i]-97]++; counts[second[i]-97]--; } for (int i = 0; i<26; i++) if (counts[i] != 0) return false; return true; }

My solution : Time Complexity = O(n)我的解决方案:时间复杂度 = O(n)

public static boolean isAnagram(String str1, String str2) {
    if (str1.length() != str2.length()) {
        return false;
    }

    for (int i = 0; i < str1.length(); i++) {
        char ch = str1.charAt(i);

        if (str2.indexOf(ch) == -1) 
            return false;
        else
            str2 = str2.replaceFirst(String.valueOf(ch), " ");
    }

    return true;
}

Test case :测试用例:

@Test
public void testIsPernutationTrue() {
    assertTrue(Anagram.isAnagram("abc", "cba"));
    assertTrue(Anagram.isAnagram("geeksforgeeks", "forgeeksgeeks"));
    assertTrue(Anagram.isAnagram("anagram", "margana"));
}

@Test
public void testIsPernutationFalse() {
    assertFalse(Anagram.isAnagram("abc", "caa"));
    assertFalse(Anagram.isAnagram("anagramm", "marganaa"));
}

The best solution depends on your objective, code size, memory footprint or least computation.最佳解决方案取决于您的目标、代码大小、内存占用或最少的计算。

A very cool solution, less code as possible, not being the fastest O(nlog n) and pretty memory inefficient in Java 8 :一个非常酷的解决方案,尽可能少的代码,不是最快的 O(nlog n) 并且在 Java 8 中内存效率很低:

public class Anagram {
  public static void main(String[] argc) {
    String str1 = "gody";
    String str2 = "dogy";

    boolean isAnagram =
    str1.chars().mapToObj(c -> (char) c).sorted().collect(Collectors.toList())
    .equals(str2.chars().mapToObj(c -> (char) c).sorted().collect(Collectors.toList()));

    System.out.println(isAnagram);
  }
}

I tried a few solutions using Sets, and made each one run 10 million times to test using your example array of:我使用 Sets 尝试了一些解决方案,并使每个解决方案运行 1000 万次以使用您的示例数组进行测试:

private static String[] input = {"tea", "ate", "eat", "apple", "java", "vaja", "cut", "utc"};

Firstly, the method i used to call these algotirhms:首先,我用来调用这些算法的方法:

public static void main(String[] args) {
    long startTime = System.currentTimeMillis();
    for (int x = 0; x < 10000000; x++) {
        Set<String> confirmedAnagrams = new HashSet<>();
        for (int i = 0; i < (input.length / 2) + 1; i++) {
            if (!confirmedAnagrams.contains(input[i])) {
                for (int j = i + 1; j < input.length; j++) {
                        if (isAnagrams1(input[i], input[j])) {
                            confirmedAnagrams.add(input[i]);
                            confirmedAnagrams.add(input[j]);
                        }
                }
            }
        }
        output = confirmedAnagrams.toArray(new String[confirmedAnagrams.size()]);
    }
    long endTime = System.currentTimeMillis();
    System.out.println("Total time: " + (endTime - startTime));
    System.out.println("Average time: " + ((endTime - startTime) / 10000000D));
}

I then used algorithms based on a HashSet of characters.然后我使用了基于字符 HashSet 的算法。 I add each character of each word to the HashSet, and should the HashSet not be the length of the initials words, it would mean they are not anagrams.我将每个单词的每个字符添加到 HashSet 中,如果 HashSet 不是首字母单词的长度,则意味着它们不是字谜。

My algorithms and their runtimes:我的算法及其运行时:

Algorithm 1:算法 1:

    private static boolean isAnagrams1(String x, String y) {
    if (x.length() != y.length()) {
        return false;
    } else if (x.equals(y)) {
        return true;
    }

    Set<Character> anagramSet = new HashSet<>();
    for (int i = 0; i < x.length(); i++) {
        anagramSet.add(x.charAt(i));
        anagramSet.add(y.charAt(i));
    }

    return anagramSet.size() != x.length();
}

This has the runtime of:这具有以下运行时:

Total time: 6914
Average time: 6.914E-4

Algorithm 2算法2

private static boolean isAnagrams2(String x, String y) {
    if (x.length() != y.length()) {
        return false;
    } else if (x.equals(y)) {
        return true;
    }

    Set<Character> anagramSet = new HashSet<>();
    char[] xAr = x.toCharArray();
    char[] yAr = y.toCharArray();
    for (int i = 0; i < xAr.length; i++) {
        anagramSet.add(xAr[i]);
        anagramSet.add(yAr[i]);
    }

    return anagramSet.size() != x.length();
}

Has the runtime of:运行时间为:

Total time: 8752
Average time: 8.752E-4

Algorithm 3算法3

For this algorithm, I decided to send the Set through, therefore I only create it once for every cycle, and clear it after each test.对于这个算法,我决定将 Set 发送通过,因此我每个周期只创建一次,并在每次测试后清除它。

    private static boolean isAnagrams3(Set<Character> anagramSet, String x, String y) {
    if (x.length() != y.length()) {
        return false;
    } else if (x.equals(y)) {
        return true;
    }

    for (int i = 0; i < x.length(); i++) {
        anagramSet.add(x.charAt(i));
        anagramSet.add(y.charAt(i));
    }

    return anagramSet.size() != x.length();
}

Has the runtime of:运行时间为:

Total time: 8251
Average time: 8.251E-4

Algorithm 4算法 4

This algorithm is not mine, it belongs to Pratik Upacharya which answered the question as well, in order for me to compare:这个算法不是我的,它属于Pratik Upacharya ,他也回答了这个问题,以便我比较:

    private static boolean isAnagrams4(String stringOne, String stringTwo) {
    char[] first = stringOne.toLowerCase().toCharArray();
    char[] second = stringTwo.toLowerCase().toCharArray();
    // if length of strings is not same 
    if (first.length != second.length) {
        return false;
    }
    int[] counts = new int[26];
    for (int i = 0; i < first.length; i++) {
        counts[first[i] - 97]++;
        counts[second[i] - 97]--;
    }
    for (int i = 0; i < 26; i++) {
        if (counts[i] != 0) {
            return false;
        }
    }
    return true;
}

Has the runtime of:运行时间为:

Total time: 5707
Average time: 5.707E-4

Of course, these runtimes do differ for every test run, and in order to do proper testing, a larger example set is needed, and maybe more iterations thereof.当然,每次测试运行时这些运行时确实不同,为了进行适当的测试,需要更大的示例集,并且可能需要更多的迭代。

*Edited, as I made a mistake in my initial method, Pratik Upacharya's algorithm does seem to be the faster one *编辑,因为我在最初的方法中犯了一个错误, Pratik Upacharya's算法似乎是更快Pratik Upacharya's算法

//here best solution for an anagram
import java.util.*;

class Anagram{
public static void main(String arg[]){

Scanner sc =new Scanner(System.in);
String str1=sc.nextLine();
String str2=sc.nextLine();
int i,j;

boolean Flag=true;
i=str1.length();
j=str2.length();


if(i==j){
for(int m=0;m<i;m++){
    for(int n=0;n<i;n++){
        if(str1.charAt(m)==str2.charAt(n)){
           Flag=true;
           break;
          }
          else
          Flag=false;
    }
}
}
else{
Flag=false;
}

if(Flag)
System.out.println("String is Anagram");
else
System.out.println("String is not Anagram");
}
}

A recruiter asked me to solve this problem recently.最近一个招聘人员让我解决这个问题。 In studying the problem I came up with a solution that solves two types of anagram issues.在研究这个问题时,我想出了一个解决方案来解决两种类型的字谜问题。

issue 1: Determine if an anagram exists within a body of text.问题 1:确定文本正文中是否存在字谜。

issue 2: Determine if a formal anagram exist within a body of text.问题 2:确定正文中是否存在正式的字谜。 In this case the anagram must be of the same size as the text you are comparing it against.在这种情况下,字谜必须与您比较的文本大小相同。 In the former case, the two texts need not be the same size.在前一种情况下,两个文本的大小不必相同。
One just needs to contain the other.一个只需要包含另一个。

My approach was as follows:我的方法如下:

setup phase: First create an anagram Class.设置阶段:首先创建一个字谜类。 This will just convert the text to a Map whose with key the character in question and the value contains the number of occurrences of the input character.这只会将文本转换为 Map ,其键是相关字符,值包含输入字符的出现次数。 I assume that at most this would require O(n) time complexity.我认为这最多需要 O(n) 时间复杂度。 And since this would require two maps at most, worst case complexity would be O(2n).由于这最多需要两个映射,最坏情况的复杂度将是 O(2n)。 At least my naive understanding of Asymptotic notations says that.至少我对渐近符号的天真理解是这样说的。

processing phase: All you need do is loop thru the smaller of the two Maps and look it up in the larger Map.处理阶段:您需要做的就是遍历两个 Map 中较小的一个,然后在较大的 Map 中查找。 If it does not exist or if it exists but with a different occurrence count, it fails the test to be an anagram.如果它不存在,或者如果它存在但出现次数不同,则它不能作为字谜测试。

Here is the loop that determines if we have an anagram or not:这是确定我们是否有字谜的循环:

    boolean looking = true;
        for (Anagram ele : smaller.values()) {
            Anagram you = larger.get(ele);
                if (you == null || you.getCount() != ele.getCount()) {
                    looking = false;
                    break;
                }
        }
        return looking;

Note that I create a ADT to contain the strings being processed.请注意,我创建了一个 ADT 来包含正在处理的字符串。 They are converted to a Map first.它们首先被转换为 Map。

Here is a snippet of the code to create the Anagram Object:以下是创建 Anagram 对象的代码片段:

    private void init(String teststring2) {
        StringBuilder sb = new StringBuilder(teststring2);
        for (int i = 0; i &lt sb.length(); i++) {
            Anagram a = new AnagramImpl(sb.charAt(i));
            Anagram tmp = map.putIfAbsent(a, a);
            if (tmp != null) {
                tmp.updateCount();
            }
        }
    }

I came up with a solution and I am not even using any 26 char array... Check this out:我想出了一个解决方案,我什至没有使用任何 26 个字符的数组……看看这个:

StringBuffer a = new StringBuffer();
        a.append(sc.next().toLowerCase());

        StringBuffer b = new StringBuffer();
        b.append(sc.next().toLowerCase());
        if(a.length() !=b.length())
        {
            System.out.println("NO");
            continue;
        }
        int o =0;
        for(int i =0;i<a.length();i++)
        {
            if(a.indexOf(String.valueOf(b.charAt(i)))<0)
            {
               System.out.println("NO");
               o=1;break; 

            }
        }
        if(o==0)
         System.out.println("Yes");

Consider using HashMap and Arrays.sort考虑使用HashMapArrays.sort

    private static Map<String, String> getAnagrams(String[] data) {

    Map<String, String> anagrams = new HashMap<>();
    Map<String, String> results = new HashMap<>();

    for (int i = 0; i < data.length; i++) {

        char[] chars = data[i].toLowerCase().toCharArray();
        Arrays.sort(chars);

        String sorted = String.copyValueOf(chars);

        String item = anagrams.get(sorted);
        if (item != null) {
            anagrams.put(sorted, item + ", " + i);
            results.put(sorted, anagrams.get(sorted));
        } else {
            anagrams.put(sorted, String.valueOf(i));
        }
    }

    return results;
}

I like it as you only traverse array only once.我喜欢它,因为你只遍历数组一次。

Solution using primitive data type.使用原始数据类型的解决方案。

boolean isAnagram(char input1[], char input2[]) {
    int bitFlip = 32;

    if(input2.length != input1.length){return false;}

    boolean found = false;
    for (int x = 0; x < input1.length; x++) {
        found = false;
        for (int y = 0; y < input2.length; y++) {
             if (!found && ((input1[x] | bitFlip)) ==
             ( (input2[y] | bitFlip))) {
                found = true;
                input2[y] = 0;
            }
        }
        if (!found) {
            break;
        }
    }
    return found ;
}

This approach doesn't rely on any sorting utility.这种方法不依赖于任何排序实用程序。 What it does is it's finding the value via iteration and after it found it, it sets it to zero to avoid input with duplicate character like "pool" and "loop" which has a 2 letter "o".它的作用是通过迭代找到值,并在找到后将其设置为零以避免输入重复字符,例如“pool”和“loop”,其中包含两个字母“o”。

It also ignores cases without relying to toLowerCase() by flipping the bit, because if the 6th bit (32 in decimal) is one, it's a small letter and capital if it's zero.它还通过翻转位来忽略不依赖 toLowerCase() 的情况,因为如果第 6 位(十进制为 32)为 1,则为小写字母,如果为零则为大写字母。

It's direct byte manipulation so it will perform better like what's used in image manipulation.它是直接字节操作,所以它会像图像操作中使用的那样表现得更好。 Maybe the downside is the O(n^2).也许缺点是 O(n^2)。

This is solution is tested in hackerrank这是在hackerrank中测试的解决方案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM