C＃解析引號內的文本

Question

我正在開發一種簡單的小搜索機制，我希望允許用戶搜索帶有空格的文本塊。 例如，用戶可以搜索一個人的名字：

姓名： John Smith

然后，我將"John Smith".Split(' ')分為兩個元素組成的數組{"John","Smith"} 。 然后，我首先返回所有與“ John”和“ Smith”匹配的記錄，然后返回與"John" OR "Smith."匹配的記錄"John" OR "Smith." 然后，我不返回任何記錄而沒有匹配項。 這不是一個復雜的場景，我已經完成了這一部分。

我現在希望能夠允許用戶僅返回與“ John Smith”匹配的記錄

我想使用基本的引號語法進行搜索。 因此，如果用戶要搜索“ John Smith”或風中奇緣，他們將輸入：“ John Smith”風中奇緣。 術語的順序是絕對無關緊要的。 “約翰·史密斯”沒有比風中奇緣優先，因為他在名單中排名第一。

關於如何解析輸入，我有兩個主要思路。

A) Using regular expression then parsing stuff (IndexOf, Split)
B) Using only the parsing methods

我認為一個邏輯上的行動點是找到引號中的內容； 然后將其從原始字符串中刪除，然后將其插入單獨的列表中。 然后，可以將原始字符串中剩下的所有內容拆分到該空間上，並插入到該單獨的列表中。 如果有一個引號或一個奇數，則只需將其從列表中刪除。

如何在正則表達式中找到匹配？ 我知道regex.Replace，但是如何遍歷匹配項並將其插入列表中。 我知道有一些使用MatchEvaluator委托和linq的巧妙方法，但是我基本上不了解C＃中的regex。

Answer 1

編輯：刷新后回到此選項卡，但未意識到此問題已得到回答...接受的答案更好。

我認為首先使用正則表達式刪除引號中的內容是個好主意。 也許是這樣的：

String sampleInput = "\"John Smith\" Pocahontas Bambi \"Jane Doe\" Aladin";

//Create regex pattern
Regex regex = new Regex("\"([^\".]+)\"");

List<string> searches = new List<string>();

//Loop through all matches from regex
foreach (Match match in regex.Matches(sampleInput))
{
    //add the match value for the 2nd group to the list
    //(1st group is the entire match)
    //(2nd group is the first parenthesis group in the defined regex pattern
    //   which in this case is the text inside the quotes)
    searches.Add(match.Groups[1].Value);
}

//remove the matches from the input
sampleInput = regex.Replace(sampleInput, String.Empty);

//split the remaining input and add the result to our searches list
searches.AddRange(sampleInput.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries));

Answer 2

我需要與Shawn相同的功能，但我不想使用正則表達式。 這是我想出的一個簡單解決方案，它對需要此功能的其他任何人使用Split（）而不是regex。

這是可行的，因為默認情況下，Split方法將在數組中為源字符串中的連續搜索值創建空條目。 如果我們對引號字符進行拆分，則結果是一個數組，其中偶數索引條目是單個單詞，而奇數索引條目將是引號短語。

例：

“John Smith” Pocahontas

結果是

item(0) = (empty string)
item(1) = John Smith
item(2) = Pocahontas

和

1 2 “3 4” 5 “6 7” “8 9”

結果是

item(0) = 1 2
item(1) = 3 4
item(2) = 5
item(3) = 6 7
item(4) = (empty string)
item(5) = 8 9

請注意，不匹配的引號將導致從最后一個引號到輸入字符串末尾的短語。

    public static List<string> QueryToTerms(string query)
    {
        List<string> Result = new List<string>();

        // split on the quote token
        string[] QuoteTerms = query.Split('"');
        // switch to denote if the current loop is processing words or a phrase
        bool WordTerms = true;

        foreach (string Item in QuoteTerms)
        {
            if (!string.IsNullOrWhiteSpace(Item))
                if (WordTerms)
                {
                    // Item contains words. parse them and ignore empty entries.
                    string[] WTerms = Item.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);
                    foreach (string WTerm in WTerms)
                        Result.Add(WTerm);
                }
                else
                    // Item is a phrase.
                    Result.Add(Item);

            // Alternate between words and phrases.
            WordTerms = !WordTerms;
        }
        return Result;
    }

Answer 3

使用這樣的正則表達式：

string input = "\"John Smith\" Pocahontas";
Regex rx = new Regex(@"(?<="")[^""]+(?="")|[^\s""]\S*");
for (Match match = rx.Match(input); match.Success; match = match.NextMatch()) {
    // use match.Value here, it contains the string to be searched
}

C＃解析引號內的文本

問題描述

3 個解決方案

解決方案1
1 2010-10-15 09:16:02

解決方案2
0 2014-05-06 00:07:48

解決方案3
0 已采納 2010-10-15 08:41:27

C＃解析引號內的文本

問題描述

3 個解決方案

解決方案1 1 2010-10-15 09:16:02

解決方案2 0 2014-05-06 00:07:48

解決方案3 0 已采納 2010-10-15 08:41:27

解決方案1
1 2010-10-15 09:16:02

解決方案2
0 2014-05-06 00:07:48

解決方案3
0 已采納 2010-10-15 08:41:27