簡體   English   中英

如何從string1中找到最大的子串到string2

[英]How to find biggest substring from string1 into string2

假設我有兩個字符串string1string2

var string1 = "images of canadian geese goslings";

var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";

我需要找到string2中匹配的string1最大子字符串。

這里最大的子字符串將是"canadian geese" ,它在string2匹配。

我怎樣才能找到它? 我嘗試將string1分解為char[]並找到單詞,然后合並匹配的單詞,但這沒有達到我的目標。

經典循環方法 - 結果包括鵝"canadian geese "之后的空格

var string1 = "images of canadian geese goslings";
var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";

string result = "";

for (int i = 0; i < string1.Length; i++)
{
    for (int j = 0; j < string1.Length - i; j++)
    {
        //add .Trim() here if you want to ignore space characters
        string searchpattern = string1.Substring(i, j);
        if (string2.IndexOf(searchpattern,  StringComparison.OrdinalIgnoreCase) > -1 && searchpattern.Length > result.Length)
        {
            result = searchpattern;
        }
    }
}

https://dotnetfiddle.net/q3rHjI

旁注: canadianCanadian不相等,所以如果你想搜索不區分大小寫,你必須使用StringComparison.OrdinalIgnoreCase

看看下面的代碼https://dotnetfiddle.net/aPyw3o

public class Program {

static IEnumerable<string> substrings(string s, int length) {
    for (int i = 0 ; i + length <= s.Length; i++) {
        var ss = s.Substring(i, length);
        if (!(ss.StartsWith(" ") || ss.EndsWith(" ")))
            yield return ss;
    }
}

public static void Main()
{
    int count = 0;
    var string1 = "images of canadian geese goslings";
    var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";
    string result = null;
    for (int i = string1.Length; i>0 && string.IsNullOrEmpty(result); i--) {
        foreach (string s in substrings(string1, i)) {
            count++;
            if (string2.IndexOf(s, StringComparison.CurrentCultureIgnoreCase) >= 0) {
                result = s;
                break;
            }
        }
    }
    if (string.IsNullOrEmpty(result)) 
        Console.WriteLine("no common substrings found");
    else 
        Console.WriteLine("'" + result + "'");
    Console.WriteLine(count);
}

}   

substrings方法返回長度為length的字符串s所有子字符串(有關yield查看文檔https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/yield ) 我們跳過以空格開頭或結尾的子字符串,因為我們不希望空格使子字符串比實際更長)

外循環遍歷子串的所有可能的長度值,從最長的(即string1.Length )到最短的(即1 )。 然后檢查每個找到的長度為i子字符串,如果它也是string2的子字符串。 如果是這種情況,我們可以停止,因為不再有任何公共子串,因為我們在之前的迭代中檢查了所有更長的子串。 但當然可能還有其他長度為i公共子串

我將使用 span/readonlymemory 再添加一個,這樣您就可以避免分配當前答案創建的所有字符串。 注意我沒有對起始空間或結束空間進行任何檢查,因為這似乎不是問題的要求。 這確實會進行不區分大小寫的搜索,如果您不希望這樣做,您可以通過使用內置的 indexof 並刪除不區分大小寫的比較來提高效率。

    static void Main(string[] _)
    {
        var string1 = "images of canadian geese goslings";

        var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";

        var longest = FindLongestMatchingSubstring(string1, string2);

        Console.WriteLine(longest);
    }

    static string FindLongestMatchingSubstring(string lhs, string rhs)
    {
        var left = lhs.AsMemory();
        var right = rhs.AsMemory();

        ReadOnlyMemory<char> longest = ReadOnlyMemory<char>.Empty;

        for (int i = 0; i < left.Length; ++i)
        {
            foreach (var block in FindMatchingSubSpans(left, i, right))
            {
                if (block.Length > longest.Length)
                    longest = block;
            }
        }

        if (longest.IsEmpty)
            return string.Empty;

        return longest.ToString();
    }

    static IEnumerable<ReadOnlyMemory<char>> FindMatchingSubSpans(ReadOnlyMemory<char> source, int pos, ReadOnlyMemory<char> matchFrom)
    {
        int lastMatch = 0;

        for (int i = pos; i < source.Length; ++i)
        {
            var ch = source.Span[i];

            int match = IndexOfChar(matchFrom, lastMatch, ch);

            if (-1 != match)
            {
                lastMatch = match + 1;

                int end = i;

                while (++end < source.Length && ++match < matchFrom.Length)
                {
                    char lhs = source.Span[end];
                    char rhs = matchFrom.Span[match];

                    if (lhs != rhs && lhs != (char.IsUpper(rhs) ? char.ToLower(rhs) : char.ToUpper(rhs)))
                    {
                        break;
                    }
                }

                yield return source.Slice(i, end - i);
            }
        }
    }

    static int IndexOfChar(ReadOnlyMemory<char> source, int pos, char ch)
    {
        char alt = char.IsUpper(ch) ? char.ToLower(ch) : char.ToUpper(ch);

        for (int i = pos; i < source.Length; ++i)
        {
            char m = source.Span[i];

            if (m == ch || m == alt)
                return i;
        }

        return -1;
    }

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM