简体   繁体   English

awk FPAT变量:工作

[英]awk FPAT variable: Working

I have been able to understand from the GNU page of GAWK that it can handle delimiters in data using the FPAT variable but I can't make through how this works. 我已经能够从GAWKGNU页面了解到它可以使用FPAT变量处理数据中的分隔符,但我无法理解它是如何工作的。 For a CSV file the FPAT value is: 对于CSV文件,FPAT值为:

FPAT = "([^,]+)|(\"[^\"]+\")"

Using the data: 使用数据:

abc,"pqr,mno"

The first grouped expression evaluates to everything ie not a comma, this should take "abc" as data then fail for the first occurrence of comma. 第一个分组表达式计算所有内容,即不是逗号,这应该将"abc"作为数据,然后在第一次出现逗号时失败。 Now my question is what happens next? 现在我的问题是接下来会发生什么? As the first grouped expression failed will the regexp continue from the character after comma using the or condition? 由于第一个分组表达式失败,regexp将继续使用或条件逗号之后的字符? but the first grouped expression continues to be valid for all data after the comma so it might take "pqr as next data? 但是第一个分组表达式继续对逗号后的所有数据有效,因此它可能需要"pqr作为下一个数据?

So the field patterns are described as the following. 因此,场模式描述如下。

A string not containing a comma where the string length is greater than zero (won't match empty strings) : 不包含逗号的字符串,其中字符串长度大于零(不匹配空字符串)

[^,]+

Or a string starting and ending with a double quotes and containing at least one character that isn't a double quote (escaping backslashes left out for readability) : 或者以双引号开头和结尾的字符串,并且包含至少一个不是双引号的字符(为了便于阅读而省略了反斜杠)

"[^"]+"

Regular expression engine match from the beginning of the string and try to match as much as possible given the patterns. 正则表达式引擎匹配从字符串的开头,并尝试尽可能匹配给定的模式。

abc,"pqr,mno" 

So abc is longest string matched by either pattern from the start of the string and hence becomes $1 . 因此, abc是字符串开头的任一模式匹配的最长字符串,因此变为$1 The next character , cannot be matched by either pattern so the regular expression engine just moves to the next character " with starts matching the second pattern. This is matched until the end of line as "pqr,mno" is a string that starts and ends with double quotes and contains at least one non-double-quote character. Therefore "pqr,mno" become $2 for the record abc,"pqr,mno" . 下一个字符,不能与任何一个模式匹配,因此正则表达式引擎只是移动到下一个字符"开始匹配第二个模式。这匹配到行尾为"pqr,mno"是一个字符串开始和结束双引号并包含至少一个非双引号字符。因此"pqr,mno"为记录abc,"pqr,mno"变为$2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM