简体   繁体   English

CSV中双引号值的正则表达式

[英]Regular Expression for double-quoted values in CSV

Given the following data, I'd like a Regex to pull out each comma-separated value. 给定以下数据,我希望使用Regex提取每个逗号分隔的值。 However, a double-quoted value may contain commas. 但是,双引号可能包含逗号。

"SMITH, JOHN",1234567890,"12/20/2012,11:00",,DRSCONSULT,DR BOB - OFFICE VISIT - CONSULT,SLEEP CENTER,1234567890,,,"a, b"
"JONES, WILLIAM",1234567890,12/20/2012,12:45,,DRSCONSULT,DR BOB - OFFICE VISIT - CONSULT,SLEEP CENTER,,,,

Here's the expression that I have so far: 到目前为止,这是我的表达方式:

(?<=^|,)(?:(?:(?<=\")([^\"]*)(?=\"))|(?:(?<![\"])([^,\"]*)(?![\"])))(?=$|,)

正则表达式可视化

Debuggex Demo Debuggex演示

The double-quoted values are not being matched. 双引号值不匹配。 What am I doing wrong? 我究竟做错了什么? (This Regex is passed into pre-existing code - I cannot rewrite the system.) (此正则表达式已传递到预先存在的代码中-我无法重写系统。)

How about: 怎么样:

(?<=^|,)(("[^"]*")|([^,]*))(?=$|,)

正则表达式可视化

Debuggex Demo Debuggex演示

The first alternative is: 第一种选择是:

("[^"]*")

Match a " followed by anything that's not a " followed by a " 匹配一个"后跟所有非"后跟"

The second alternative is just: 第二种选择是:

([^,]*)

Match anything that isn't a , 匹配不是的任何内容,

This pattern should work: 这种模式应该起作用:

(\w+\,\s\w+|[\d\/]*\,\d+\:\d*|[\w\d\:\s\-]+)

example: 例:

http://regex101.com/r/rI8nS1 http://regex101.com/r/rI8nS1

When using the pattern in C# you might need to escape it llke: 在C#中使用模式时,您可能需要将其转义:

Match match = Regex.Match(searchText, "(?m)(?x)(\\w+\\,\\s\\w+|[\\d\\/]*\\,\\d+\\:\\d*|[\\w\\d\\:\\s\\-]+)"); 
    if (match.Success) {...}

Here's the code which I use for coping with quote-aware CSVs 这是我用来处理支持报价的CSV的代码

//regex to translate a CSV
readonly Regex csvParser = new Regex( "(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)", RegexOptions.Compiled);

//given a row from the csv file, loop through returning an array of column values
private IEnumerable<string> ProcessCsvRow(string row)
{
    MatchCollection results = csvParser.Matches(row);
    foreach (Match match in results)
    {
        foreach (Capture capture in match.Captures)
        {
            yield return (capture.Value ?? string.Empty).TrimStart(",").Trim('"', ' ');
        }
    }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 管道定界和双引号字符串的正则表达式 - Regular expression for pipe delimited and double quoted string 正则表达式可检测带括号的双引号javascript对象属性 - Regular expression to detect double quoted javascript object properties with brackets 具有正则表达式的C#过滤器多行双引号字符串 - C# filter multiline double quoted string with Regular expression 单引号和双引号html属性之间的功能差异是什么? - What are the functional differences between single-quoted vs double-quoted html attributes? 如何使用正则表达式提取带引号的字符串? 或者如何在c#中使用这个特定的正则表达式(“[^”] * \\ .csv“)? - How to extract a quoted string using regular expression? OR How to use this particular regular expression (“[^”]*\.csv") in c#? 从文件读取时,C#双引号路径名被转义 - C# double-quoted path name being escaped when read from file InfluxDB lineprotocol C# 写入使用双引号字符串给出“错误时间戳”错误 - InfluxDB lineprotocol C# write using double-quoted string gives "bad timestamp" error 正则表达式应该split,包含在CSV文件中的双引号之外吗? - regular expression should split , that are contained outside the double quotes in a CSV file? 正则表达式匹配嵌入另一个带引号的字符串中的带引号的字符串 - Regular Expression to match a quoted string embedded in another quoted string 除引号外,正则表达式以逗号分隔 - Regular Expression To Split On Comma Except If Quoted
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM