简体   繁体   中英

awk field separator with regexp lookahead or lookbehind

I want to split line with escape sequence but failed. For example:

$ echo "1,2\,2,333"|awk -F "(?<\!\\,)," '{print $2}'   ## expecting "2\,2"
awk: warning: escape sequence `\!' treated as plain `!'
awk: warning: escape sequence `\,' treated as plain `,'

Does awk/gawk support field separator with regexp lookahead or lookbehind ?

As I have said in comment, awk does not support look-ahead or look-behind, since it uses POSIX Extended Regular Expression (ERE). If you really need look-ahead or look-behind, you might want to use Perl instead. However, in this case, you can slightly change your approach to solve the problem.

If you data contains the delimiter, instead of splitting the data by looking for an unescaped delimiter (which can fail when there are many \\ in a row), it's better to match the fields directly instead.

The regex to match the fields is /([^\\\\,]|\\\\.)+/ . Do note that this regex is not aware of quoted fields. If you want to support them, it depends on how you deal with cases where the quotes are not closed properly, or there are more than one quote in a field. If you can assume that your data is well-formatted, then you can just come up with a regex that works for your data.

Here is something to get you started. The code below prints all the fields in a line.

echo "1,2\,2,333" | awk '{while (match($0, /([^\\,]|\\.)+/)) {print substr($0, RSTART, RLENGTH);$0=substr($0, RSTART+RLENGTH)}}'

Reference

One way to handle this is using FPAT (splitting by content) in gnu-awk:

awk 'BEGIN{ FPAT=",([^\\\\]*\\\\,)*[^,]*,|[^,]+" } {
  for (i=1; i<=NF; i++) {gsub(/^,|,$/, "", $i); printf "$%d: <%s>\n", i, $i}
}' <<< "1,2\,2,333"
$1: <1>
$2: <2\,2>
$3: <333>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM