[英]Multiple string replace in c#
I am dynamically editing a regex for matching text in a pdf, which can contain hyphenation at the end of some lines. 我正在动态编辑一个正则表达式以匹配pdf中的文本,该文本可能在某些行的末尾包含连字符。
Example: 例:
Source string: 源字符串:
"consecuti?vely"
Replace rules: 替换规则:
.Replace("cuti?",@"cuti?(-\s+)?")
.Replace("con",@"con(-\s+)?")
.Replace("consecu",@"consecu(-\s+)?")
Desired output: 所需的输出:
"con(-\s+)?secu(-\s+)?ti?(-\s+)?vely"
The replace rules are built dynamically, this is just an example which causes problems. 替换规则是动态构建的,这只是一个导致问题的示例。
Whats the best solution to perform such a multiple replace, which will produce the desired output? 什么是执行此类多次替换的最佳解决方案,它将产生所需的输出?
So far I thought about using Regex.Replace and zipping the word to replace with optional (-\\s+)? 到目前为止,我还考虑过使用Regex.Replace并将该单词压缩后替换为可选(-\\ s +)? between each character, but that would not work, because the word to replace already contains special-meaning characters in regex context. 之间的每个字符之间,但这是行不通的,因为在正则表达式上下文中,要替换的单词已经包含特殊含义的字符。
EDIT: My current code, doesnt work when replace rules overlap like in example above 编辑:我当前的代码,当替换规则重叠时不起作用,如上面的示例
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, @"\w+\-\s");
for (int i = 0; i < hyphenatedParts.Count; i++)
{
var partBeforeHyphen = String.Concat(hyphenatedParts[i].Value.TakeWhile(c => c != '-'));
regex = regex.Replace(partBeforeHyphen, partBeforeHyphen + @"(-\s+)?");
}
return regex;
}
the output of this program is "con(-\\s+)?secu(-\\s+)?ti?(-\\s+)?vely"; 该程序的输出为“ con(-\\ s +)?secu(-\\ s +)?ti?(-\\ s +)?vely”; and as I understand your problem, my code can completely solve your problem. 据我了解您的问题,我的代码可以完全解决您的问题。
class Program
{
class somefields
{
public string first;
public string secound;
public string Add;
public int index;
public somefields(string F, string S)
{
first = F;
secound = S;
}
}
static void Main(string[] args)
{
//declaring output
string input = "consecuti?vely";
List<somefields> rules=new List<somefields>();
//declaring rules
rules.Add(new somefields("cuti?",@"cuti?(-\s+)?"));
rules.Add(new somefields("con",@"con(-\s+)?"));
rules.Add(new somefields("consecu",@"consecu(-\s+)?"));
// finding the string which must be added to output string and index of that
foreach (var rul in rules)
{
var index=input.IndexOf(rul.first);
if (index != -1)
{
var add = rul.secound.Remove(0,rul.first.Count());
rul.Add = add;
rul.index = index+rul.first.Count();
}
}
// sort rules by index
for (int i = 0; i < rules.Count(); i++)
{
for (int j = i + 1; j < rules.Count(); j++)
{
if (rules[i].index > rules[j].index)
{
somefields temp;
temp = rules[i];
rules[i] = rules[j];
rules[j] = temp;
}
}
}
string output = input.ToString();
int k=0;
foreach(var rul in rules)
{
if (rul.index != -1)
{
output = output.Insert(k + rul.index, rul.Add);
k += rul.Add.Length;
}
}
System.Console.WriteLine(output);
System.Console.ReadLine();
}
}
You should probably write your own parser, it's probably easier to maintain :). 您可能应该编写自己的解析器,它可能更易于维护:)。
Maybe you could add "special characters" around pattern in order to protect them like "##" if the strings not contains it. 也许您可以在模式周围添加“特殊字符”,以便在字符串中不包含“ ##”的情况下保护它们。
试试这个:
var final = Regex.Replace(originalTextOfThePage, @"(\w+)(?:\-[\s\r\n]*)?", "$1");
I had to give up an easy solution and did the editing of the regex myself. 我不得不放弃一个简单的解决方案,自己进行了正则表达式的编辑。 As a side effect, the new approach goes only twice trough the string. 副作用是,新方法只能在字符串中两次通过。
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var indexesToInsertPossibleHyphenation = GetPossibleHyphenPositions(regex, searchedPage);
var hyphenationToken = @"(-\s+)?";
return InsertStringTokenInAllPositions(regex, indexesToInsertPossibleHyphenation, hyphenationToken);
}
private static string InsertStringTokenInAllPositions(string sourceString, List<int> insertionIndexes, string insertionToken)
{
if (insertionIndexes == null || string.IsNullOrEmpty(insertionToken)) return sourceString;
var sb = new StringBuilder(sourceString.Length + insertionIndexes.Count * insertionToken.Length);
var linkedInsertionPositions = new LinkedList<int>(insertionIndexes.Distinct().OrderBy(x => x));
for (int i = 0; i < sourceString.Length; i++)
{
if (!linkedInsertionPositions.Any())
{
sb.Append(sourceString.Substring(i));
break;
}
if (i == linkedInsertionPositions.First.Value)
{
sb.Append(insertionToken);
}
if (i >= linkedInsertionPositions.First.Value)
{
linkedInsertionPositions.RemoveFirst();
}
sb.Append(sourceString[i]);
}
return sb.ToString();
}
private List<int> GetPossibleHyphenPositions(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, @"\w+\-\s");
var indexesToInsertPossibleHyphenation = new List<int>();
//....
// Aho-Corasick to find all occurences of all
//strings in "hyphenatedParts" in the "regex" string
// ....
return indexesToInsertPossibleHyphenation;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.