C＃中的多字符串替换

Question

I am dynamically editing a regex for matching text in a pdf, which can contain hyphenation at the end of some lines. 我正在动态编辑一个正则表达式以匹配pdf中的文本，该文本可能在某些行的末尾包含连字符。

Example: 例：

Source string: 源字符串：

"consecuti?vely"

Replace rules: 替换规则：

 .Replace("cuti?",@"cuti?(-\s+)?")
 .Replace("con",@"con(-\s+)?")
 .Replace("consecu",@"consecu(-\s+)?")

Desired output: 所需的输出：

"con(-\s+)?secu(-\s+)?ti?(-\s+)?vely"

The replace rules are built dynamically, this is just an example which causes problems. 替换规则是动态构建的，这只是一个导致问题的示例。

Whats the best solution to perform such a multiple replace, which will produce the desired output? 什么是执行此类多次替换的最佳解决方案，它将产生所需的输出？

So far I thought about using Regex.Replace and zipping the word to replace with optional (-\\s+)? 到目前为止，我还考虑过使用Regex.Replace并将该单词压缩后替换为可选（-\\ s +）？ between each character, but that would not work, because the word to replace already contains special-meaning characters in regex context. 之间的每个字符之间，但这是行不通的，因为在正则表达式上下文中，要替换的单词已经包含特殊含义的字符。

EDIT: My current code, doesnt work when replace rules overlap like in example above 编辑：我当前的代码，当替换规则重叠时不起作用，如上面的示例

private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
    {
        var originalTextOfThePage = mPagesNotModified[searchedPage];
        var hyphenatedParts = Regex.Matches(originalTextOfThePage, @"\w+\-\s");
        for (int i = 0; i < hyphenatedParts.Count; i++)
        {
            var partBeforeHyphen = String.Concat(hyphenatedParts[i].Value.TakeWhile(c => c != '-'));

            regex = regex.Replace(partBeforeHyphen, partBeforeHyphen + @"(-\s+)?");
        }
        return regex;
    }

Answer 1

the output of this program is "con(-\\s+)?secu(-\\s+)?ti?(-\\s+)?vely"; 该程序的输出为“ con（-\\ s +）？secu（-\\ s +）？ti？（-\\ s +）？vely”； and as I understand your problem, my code can completely solve your problem. 据我了解您的问题，我的代码可以完全解决您的问题。

class Program
    {
        class somefields
        {
            public string first;
            public string secound;
            public string Add;
            public int index;
            public somefields(string F, string S)
            {
                first = F;
                secound = S;
            }

        }
    static void Main(string[] args)
    {
        //declaring output
        string input = "consecuti?vely";
        List<somefields> rules=new List<somefields>();
        //declaring rules
        rules.Add(new somefields("cuti?",@"cuti?(-\s+)?"));
        rules.Add(new somefields("con",@"con(-\s+)?"));
        rules.Add(new somefields("consecu",@"consecu(-\s+)?"));
        // finding the string which must be added to output string and index of that
        foreach (var rul in rules)
        {
            var index=input.IndexOf(rul.first);
            if (index != -1)
            {
                var add = rul.secound.Remove(0,rul.first.Count());
                rul.Add = add;
                rul.index = index+rul.first.Count();
            }

        }
        // sort rules by index
        for (int i = 0; i < rules.Count(); i++)
        {
            for (int j = i + 1; j < rules.Count(); j++)
            {
                if (rules[i].index > rules[j].index)
                {
                    somefields temp;
                    temp = rules[i];
                    rules[i] = rules[j];
                    rules[j] = temp;
                }
            }
        }

        string output = input.ToString();
        int k=0;
        foreach(var rul in rules)
        {
            if (rul.index != -1)
            {
                output = output.Insert(k + rul.index, rul.Add);
                k += rul.Add.Length;
            }
        }
        System.Console.WriteLine(output);
        System.Console.ReadLine();
    }
}

Answer 2

You should probably write your own parser, it's probably easier to maintain :). 您可能应该编写自己的解析器，它可能更易于维护:)。

Maybe you could add "special characters" around pattern in order to protect them like "##" if the strings not contains it. 也许您可以在模式周围添加“特殊字符”，以便在字符串中不包含“ ##”的情况下保护它们。

Answer 3

试试这个：

var final = Regex.Replace(originalTextOfThePage, @"(\w+)(?:\-[\s\r\n]*)?", "$1");

Answer 4

I had to give up an easy solution and did the editing of the regex myself. 我不得不放弃一个简单的解决方案，自己进行了正则表达式的编辑。 As a side effect, the new approach goes only twice trough the string. 副作用是，新方法只能在字符串中两次通过。

private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
    {
        var indexesToInsertPossibleHyphenation = GetPossibleHyphenPositions(regex, searchedPage);
        var hyphenationToken = @"(-\s+)?";
        return InsertStringTokenInAllPositions(regex, indexesToInsertPossibleHyphenation, hyphenationToken);
    }

    private static string InsertStringTokenInAllPositions(string sourceString, List<int> insertionIndexes, string insertionToken)
    {
        if (insertionIndexes == null || string.IsNullOrEmpty(insertionToken)) return sourceString;

        var sb = new StringBuilder(sourceString.Length + insertionIndexes.Count * insertionToken.Length);
        var linkedInsertionPositions = new LinkedList<int>(insertionIndexes.Distinct().OrderBy(x => x));
        for (int i = 0; i < sourceString.Length; i++)
        {
            if (!linkedInsertionPositions.Any())
            {
                sb.Append(sourceString.Substring(i));
                break;
            }
            if (i == linkedInsertionPositions.First.Value)
            {
                sb.Append(insertionToken);
            }
            if (i >= linkedInsertionPositions.First.Value)
            {
                linkedInsertionPositions.RemoveFirst();
            }
            sb.Append(sourceString[i]);
        }
        return sb.ToString();
    }

    private List<int> GetPossibleHyphenPositions(string regex, int searchedPage)
    {
        var originalTextOfThePage = mPagesNotModified[searchedPage];
        var hyphenatedParts = Regex.Matches(originalTextOfThePage, @"\w+\-\s");
        var indexesToInsertPossibleHyphenation = new List<int>();
        //....
        // Aho-Corasick to find all occurences of all 
        //strings in "hyphenatedParts" in the "regex" string
        // ....
        return indexesToInsertPossibleHyphenation;
    }

C＃中的多字符串替换

问题描述

4 个解决方案

解决方案1
2 2012-07-31 11:07:17

解决方案2
0 2012-07-31 08:52:49

解决方案3
-1 2012-07-31 10:21:57

解决方案4
-1 已采纳 2012-07-31 11:32:02

C＃中的多字符串替换

问题描述

4 个解决方案

解决方案1 2 2012-07-31 11:07:17

解决方案2 0 2012-07-31 08:52:49

解决方案3 -1 2012-07-31 10:21:57

解决方案4 -1 已采纳 2012-07-31 11:32:02

解决方案1
2 2012-07-31 11:07:17

解决方案2
0 2012-07-31 08:52:49

解决方案3
-1 2012-07-31 10:21:57

解决方案4
-1 已采纳 2012-07-31 11:32:02