具有正则表达式前瞻或后瞻的awk字段分隔符

Question

I want to split line with escape sequence but failed. 我想用转义序列拆分但是失败了。 For example: 例如：

$ echo "1,2\,2,333"|awk -F "(?<\!\\,)," '{print $2}'   ## expecting "2\,2"
awk: warning: escape sequence `\!' treated as plain `!'
awk: warning: escape sequence `\,' treated as plain `,'

Does awk/gawk support field separator with regexp lookahead or lookbehind ? awk / gawk是否支持带regexp lookahead或lookbehind的字段分隔符？

Answer 1

As I have said in comment, awk does not support look-ahead or look-behind, since it uses POSIX Extended Regular Expression (ERE). 正如我在评论中所说，awk不支持前瞻或后视，因为它使用POSIX扩展正则表达式（ERE）。 If you really need look-ahead or look-behind, you might want to use Perl instead. 如果你真的需要前瞻或后瞻，你可能想要使用Perl。 However, in this case, you can slightly change your approach to solve the problem. 但是，在这种情况下，您可以稍微改变解决问题的方法。

If you data contains the delimiter, instead of splitting the data by looking for an unescaped delimiter (which can fail when there are many \\ in a row), it's better to match the fields directly instead. 如果数据包含分隔符，而不是通过查找未转义的分隔符（当行中有多个\\时可能会失败）来分割数据，则最好直接匹配字段。

The regex to match the fields is /([^\\\\,]|\\\\.)+/ . 匹配字段的正则表达式是/([ /([^\\\\,]|\\\\.)+/ 。） /([^\\\\,]|\\\\.)+/ 。 Do note that this regex is not aware of quoted fields. 请注意，此正则表达式不知道引用的字段。 If you want to support them, it depends on how you deal with cases where the quotes are not closed properly, or there are more than one quote in a field. 如果您想支持它们，则取决于您如何处理报价未正确关闭的情况，或者字段中有多个引号。 If you can assume that your data is well-formatted, then you can just come up with a regex that works for your data. 如果您可以假设您的数据格式正确，那么您可以想出一个适用于您的数据的正则表达式。

Here is something to get you started. 这是让你入门的东西。 The code below prints all the fields in a line. 下面的代码打印一行中的所有字段。

echo "1,2\,2,333" | awk '{while (match($0, /([^\\,]|\\.)+/)) {print substr($0, RSTART, RLENGTH);$0=substr($0, RSTART+RLENGTH)}}'

Reference 参考

How to get match regex pattern using awk from file? 如何从文件中使用awk获取匹配的正则表达式模式？

Answer 2

One way to handle this is using FPAT (splitting by content) in gnu-awk: 处理此问题的一种方法是在gnu-awk中使用FPAT （按内容拆分）：

awk 'BEGIN{ FPAT=",([^\\\\]*\\\\,)*[^,]*,|[^,]+" } {
  for (i=1; i<=NF; i++) {gsub(/^,|,$/, "", $i); printf "$%d: <%s>\n", i, $i}
}' <<< "1,2\,2,333"
$1: <1>
$2: <2\,2>
$3: <333>

具有正则表达式前瞻或后瞻的awk字段分隔符

问题描述

2 个解决方案

解决方案1
5 2015-05-25 03:17:33

Reference 参考

解决方案2
3 2015-05-25 04:41:54

具有正则表达式前瞻或后瞻的awk字段分隔符

问题描述

2 个解决方案

解决方案1 5 2015-05-25 03:17:33

Reference 参考

解决方案2 3 2015-05-25 04:41:54

解决方案1
5 2015-05-25 03:17:33

解决方案2
3 2015-05-25 04:41:54