用於在另一個內查找字符串部分的正則表達式

Question

我有兩個字符串：第一個值是“catdog”，第二個值是“got”。

我正試圖找一個正則表達式告訴我“得到”的字母是否在“catdog”中。 我特別希望避免出現重復字母的情況。 例如，我知道“得到”是匹配，但“gott”不匹配，因為“catdog”中沒有兩個“t”。

編輯：

根據Adam在下面的回答，這是我在我的解決方案中工作的C＃代碼。 感謝所有回復的人。

注意：我必須將char轉換為int並減去97以獲得數組的適當索引。 在我的情況下，字母總是小寫。

    private bool CompareParts(string a, string b)
    {

        int[] count1 = new int[26];
        int[] count2 = new int[26];

        foreach (var item in a.ToCharArray())
            count1[(int)item - 97]++;

        foreach (var item in b.ToCharArray())
            count2[(int)item - 97]++;

        for (int i = 0; i < count1.Length; i++)
            if(count2[i] > count1[i])
                return false;

        return true;
    }

Answer 1

你正在使用錯誤的工具來完成工作。 這不是正則表達式能夠輕松處理的東西。 幸運的是，沒有正則表達式，這樣做相對容易。 您只需計算兩個字符串中每個字母的出現次數，並比較兩個字符串之間的計數 - 如果對於字母表中的每個字母，第一個字符串中的計數至少與第二個字符串中的計數一樣大，那么你的標准就滿足了。 由於您沒有指定語言，因此這里的偽代碼答案應該可以輕松翻譯成您的語言：

bool containsParts(string1, string2)
{
    count1 = array of 26 0's
    count2 = array of 26 0's

    // Note: be sure to check for an ignore non-alphabetic characters,
    // and do case conversion if you want to do it case-insensitively
    for each character c in string1:
        count1[c]++
    for each character c in string2:
        count2[c]++

    for each character c in 'a'...'z':
        if count1[c] < count2[c]:
            return false

    return true
}

Answer 2

以前的建議已經提出，也許正則表達式不是最好的方法，但我同意，但是，你接受的答案有點冗長，考慮到你想要實現的目標，那就是測試是否有一組字母是另一組字母的子集。

請考慮以下代碼，這些代碼在一行代碼中實現：

MatchString.ToList().ForEach(Item => Input.Remove(Item));

可以使用如下：

public bool IsSubSetOf(string InputString, string MatchString) 
{
  var InputChars = InputString.ToList(); 
  MatchString.ToList().ForEach(Item => InputChars.Remove(Item)); 
  return InputChars.Count == 0;
}

然后，您可以調用此方法來驗證它是否是子集。

有趣的是，“got”將返回一個沒有項目的列表，因為匹配字符串中的每個項目只出現一次，但“gott”將返回一個包含單個項目的列表，因為只有一個調用才能刪除列表中的“t”。 因此，您將在列表中留下一個項目。 也就是說，“gott”不是“catdog”的子集，而是“got”。

您可以更進一步，將該方法放入靜態類：

using System;
using System.Linq;
using System.Runtime.CompilerServices;

static class extensions
{
    public static bool IsSubSetOf(this string InputString, string MatchString)
    {
        var InputChars = InputString.ToList();
        MatchString.ToList().ForEach(Item => InputChars.Remove(Item));
        return InputChars.Count == 0;
    }
}

這使得你的方法成為字符串對象的擴展，從長遠來看，這實際上使得更容易，因為你現在可以這樣調用你的調用：

Console.WriteLine("gott".IsSubSetOf("catdog"));

Answer 3

你想要一個與這些字母完全匹配的字符串，只需一次。 這取決於你正在寫的正則表達式，但它會是這樣的

^[^got]*(g|o|t)[^got]$

如果你有一個“完全匹配”的操作員，那將有所幫助。

Answer 4

我認為使用正則表達式有一種理智的方法。 瘋狂的方法是寫出所有的排列：

/^(c?a?t?d?o?g?|c?a?t?d?g?o?| ... )$/

現在，通過一些小技巧，您可以使用一些正則表達式（Perl中的示例，未經測試）：

$foo = 'got';
$foo =~ s/c//;
$foo =~ s/a//;
...
$foo =~ s/d//;
# if $foo is now empty, it passes the test.

當然，Sane人會使用循環：

$foo = 'got'
foreach $l (split(//, 'catdog') {
    $foo =~ s/$l//;
}
# if $foo is now empty, it passes the test.

當然，有很多更好的方法來解決這個問題，但他們不使用正則表達式。 毫無疑問，例如，你可以使用Perl的擴展正則表達式功能，如嵌入式代碼。

Answer 5

查理馬丁幾乎是正確的，但你必須為每個字母做一個完整的通行證。 除了最后一遍之外，你可以使用一個正則表達式為單個正則表達式做到這一點：

/^
 (?=[^got]*g[^got]*$)
 (?=[^got]*o[^got]*$)
 [^got]*t[^got]*
$/x

這對於磨練你的正則表達式技巧來說是一個很好的練習，但如果我必須在現實生活中這樣做，我不會這樣做。 非正則表達式方法需要更多的輸入，但任何最低限度的程序員都能夠理解和維護它。 如果你使用正則表達式，那個假設的維護者也必須在正則表達式上具有超過最低限度的能力。

Answer 6

@Adam Rosenfield的Python解決方案：

from collections import defaultdict

def count(iterable):
    c = defaultdict(int)
    for hashable in iterable:
        c[hashable] += 1
    return c

def can_spell(word, astring):
    """Whether `word` can be spelled using `astring`'s characters."""

    count_string = count(astring)
    count_word   = count(word)

    return all(count_string[c] >= count_word[c] for c in word)

Answer 7

使用正則表達式的最佳方法是，IMO：

A.排序大字符串中的字符（搜索空間）因此：將“catdog”變成“acdgot”

B.

對搜索字符的字符串執行相同的操作：“gott”變為，呃，“gott”......
在每個字符之間插入“ .* ”
使用后者作為正則表達式來搜索前者。

例如，一些Perl代碼（如果你不介意的話）：

$main = "catdog"; $search = "gott";
# break into individual characters, sort, and reconcatenate
$main = join '', sort split //, $main;
$regexp = join ".*", sort split //, $search;
print "Debug info: search in '$main' for /$regexp/ \n";
if($main =~ /$regexp/) {
    print "Found a match!\n";
} else {
    print "Sorry, no match...\n";
}

這打印：

Debug info: search in 'acdgot' for /g.*o.*t.*t/
Sorry, no match...

放一個“t”就可以得到一個匹配。

用於在另一個內查找字符串部分的正則表達式

問題描述

7 個解決方案

解決方案1
7 已采納 2008-12-19 06:01:24

解決方案2
3 2008-12-19 20:40:04

解決方案3
0 2008-12-19 05:52:20

解決方案4
0 2008-12-19 05:57:04

解決方案5
0 2008-12-19 08:05:25

解決方案6
0 2008-12-19 09:05:02

解決方案7
0 2008-12-19 22:44:53

用於在另一個內查找字符串部分的正則表達式

問題描述

7 個解決方案

解決方案1 7 已采納 2008-12-19 06:01:24

解決方案2 3 2008-12-19 20:40:04

解決方案3 0 2008-12-19 05:52:20

解決方案4 0 2008-12-19 05:57:04

解決方案5 0 2008-12-19 08:05:25

解決方案6 0 2008-12-19 09:05:02

解決方案7 0 2008-12-19 22:44:53

解決方案1
7 已采納 2008-12-19 06:01:24

解決方案2
3 2008-12-19 20:40:04

解決方案3
0 2008-12-19 05:52:20

解決方案4
0 2008-12-19 05:57:04

解決方案5
0 2008-12-19 08:05:25

解決方案6
0 2008-12-19 09:05:02

解決方案7
0 2008-12-19 22:44:53