简体   繁体   English

使用正则表达式替换模式替换字符串

[英]Replace string using Regex replace pattern

Basically I am dealing with CSV file and reading it line by line in C#. 基本上我正在处理CSV文件,并在C#中逐行读取它。 I have a string input(a line) and trying to find a Regex pattern and replace it using another Regex pattern but result is not what I expect. 我有一个字符串输入(一行),试图找到一个正则表达式模式并使用另一个正则表达式模式替换它,但是结果不是我期望的。

var input = "\"efgh ,ijkl123,\",abcd ,  \"efgh ,ijkl123,\",mnop456 \"efgh ,ijkl123,\"";

In output I need to replace internal commas between double quotations with semicolon where those double quotation are between commas themselves. 在输出中,我需要用分号替换双引号之间的内部逗号,其中双引号位于逗号之间。

between double quotation and external comma (comma outside of pair of double quotes) it can be only white space. 在双引号和外部逗号(双引号对之间的逗号)之间,只能是空格。

So I expect output to be: "efgh ;ijkl123,",abcd , "efgh ;ijkl123,",mnop456 "efgh ,ijkl123," 所以我希望输出为: "efgh ;ijkl123,",abcd , "efgh ;ijkl123,",mnop456 "efgh ,ijkl123,"

my code: 我的代码:

var pattern = @".*,\s*""(.*,+.*)+""\s*,.*";
var replacePattern = @".*,\s*""(.*;+.*)+""\s*,.*";
if (Regex.IsMatch(input, pattern))
{
    var output = Regex.Replace(input, pattern, replacePattern);
}

but running my code, output is: . 但运行我的代码,输出为:。 ,\\s "(. ;+. )+"\\s*,.* which is replacePattern. ,\\ s “( 。; +。 )+” \\ s *,。*,它是replacePattern。

EDIT more input sample and output as expected: 按预期编辑更多输入样本和输出:

  1. input abcd , "efgh ,ijkl123,",mnop456 输入abcd , "efgh ,ijkl123,",mnop456

    output abcd , "efgh ;ijkl123;",mnop456 输出abcd , "efgh ;ijkl123;",mnop456

  2. input "efgh ,ijkl123,",abcd , "efgh ,ijkl123,",mnop456 "efgh ,ijkl123," 输入"efgh ,ijkl123,",abcd , "efgh ,ijkl123,",mnop456 "efgh ,ijkl123,"

    output "efgh ;ijkl123;",abcd , "efgh ;ijkl123;",mnop456 "efgh ,ijkl123," 输出"efgh ;ijkl123;",abcd , "efgh ;ijkl123;",mnop456 "efgh ,ijkl123,"

  3. input ,"efgh ,ijkl123,",abcd" , "efgh ijkl123,",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456 输入,"efgh ,ijkl123,",abcd" , "efgh ijkl123,",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456

    output ,"efgh ;ijkl123;",abcd" , "efgh ijkl123;",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456 输出,"efgh ;ijkl123;",abcd" , "efgh ijkl123;",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456

  4. input ,"efgh" ,ijkl123,",abcd" , "efgh ijkl123,",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456 输入,"efgh" ,ijkl123,",abcd" , "efgh ijkl123,",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456

    output ,"efgh" ,ijkl123,";abcd" , "efgh ijkl123;",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456 输出,"efgh" ,ijkl123,";abcd" , "efgh ijkl123;",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456

  5. input efgh ,ijkl123,",abcd , "efgh ,ijkl123,",mnop456 "efgh ,ijkl123," 输入efgh ,ijkl123,",abcd , "efgh ,ijkl123,",mnop456 "efgh ,ijkl123,"

    output efgh ,ijkl123,",abcd , "efgh ;ijkl123;",mnop456 "efgh ,ijkl123," 输出efgh ,ijkl123,",abcd , "efgh ;ijkl123;",mnop456 "efgh ,ijkl123,"

Well, this is bit tricky and I'm sure someone will suggest a better regex than mine. 好吧,这有点棘手,我敢肯定有人会建议比我的更好的正则表达式。 Suppose you input text is: 假设您输入的文字是:

"efgh ,ijkl123,",abcd ,  "efgh ,ijkl123,",mnop456 "efgh ,ijkl123,"

You can try: 你可以试试:

var data = "\"efgh ,ijkl123,\",abcd ,  \"efgh ,ijkl123,\",mnop456 \"efgh ,ijkl123,\"";

var rx = @"(?<=(^|,[ \t]*))\""[^\""\n]+\""(?=[ \t]*(,|$))";

var matches = Regex.Matches (data, rx);

foreach (Match match in matches) {
    data = new Regex (match.Value).
        Replace(data, match.Value.Replace (',', ';'), 1);
}

Console.WriteLine (data);

It will emit: 它将发出:

"efgh ;ijkl123;",abcd ,  "efgh ;ijkl123;",mnop456, "efgh ,ijkl123," 

Code above is essentially replacing all , comas between double quotes with ; 上面的代码基本上替换所有,用双引号之间昏迷; semi colons. 半冒号。

Not sure is it very efficient, but works. 不确定它是否非常有效,但是可以。 Suggestions are welcome to improve it further. 欢迎提出进一步改进的建议。

string  input = "\"efgh ,ijkl123,\",abcd ,  \"efgh ,ijkl123,\",mnop456 \"efgh ,ijkl123,\"";; 

Regex.Matches(input, "\"([^\"]*)\"(,)") // Extract string between quotes followed by ','.
.Cast<Match>()
    .ToList()
    .ForEach(m=> input = input.Replace(m.Value, m.Value.Replace(",",";")) // for each match replace with ';' inserted match.
                              .Replace(";\";",",\","));  // a hack, should have done it better

Ouput :

"efgh ;ijkl123,",abcd ,  "efgh ;ijkl123,",mnop456 "efgh ,ijkl123,"

Working Demo 工作Demo

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM