使用sed | awk和group进行替换的正则表达式

Question

I have this input text 我有这个输入文字

16789248,16789759,"AS24155 Asia Pacific Broadband Wireless Communications Inc"

I want this text 我想要这个文字

"AS24155","Asia Pacific Broadband Wireless Communications Inc"

This regex matches 此正则表达式匹配

 /(.*)(AS\d+)(\s)([^"]+).*/g

with this substitution "$2","$4" 替换为"$2","$4"

I have to process 300k lines and it would be best if I was able to use a linux based command line utility like sed or awk...but I keep getting no matches or matches even though the regex seems to match elsewhere. 我必须处理30万行，如果能够使用基于sed或awk的基于Linux的命令行实用程序，那将是最好的选择，但是即使正则表达式似乎在其他地方也可以匹配，但我一直没有匹配。

Should I be using something different? 我应该使用其他东西吗？

Answer 1

sed -r can handle it with a few modifications: [0-9] instead of \\d and <space> instead of \\s . sed -r可以进行一些修改： [0-9]代替\\d和<space>代替\\s 。 There's no real reason to capture the first and third parts, so I've removed those groups. 没有真正的理由要抓住第一部分和第三部分，因此我删除了这些组。

sed -r -e 's/.*(AS[0-9]+) ([^"]+).*/"\1","\2"/'

Or if you want to match those character classes exactly, use [[:digit:]] for \\d and [[:space:]] for \\s : 或者，如果你想完全匹配的字符类，使用[[:digit:]]为\\d和[[:space:]]为\\s ：

sed -r -e 's/.*(AS[[:digit:]]+)[[:space:]]([^"]+).*/"\1","\2"/'

Alternatively, you could use csvtool which is more suited to the job of parsing CSV files than sed is. 另外，您可以使用csvtool ，它比sed更适合于解析CSV文件。

csvtool col 3 input.txt | while read number name; do
    printf '"%s","%s"\n' "$number" "$name"
done

Answer 2

sed 's/[^"]*"/"/;s[[:space:]]/","/'

根据您的样本并避免分组的问题

Answer 3

sed is the best choice for this but FYI here's how you could use almost your exact RE in GNU awk to do the job: sed是最佳选择，但仅供参考，这是您可以在GNU awk中几乎使用您的确切RE来完成的工作：

$ awk 'match($0,/.*(AS[0-9]+)\s([^"]+).*/,a){printf "\"%s\",\"%s\"\n", a[1], a[2]}' file
"AS24155","Asia Pacific Broadband Wireless Communications Inc"

Your original command was probably failing because only some tools accept \\s instead of [[:space:]] and almost none accept \\d instead of [[:digit:]] (or [0-9] ). 您的原始命令可能失败，因为只有某些工具接受\\s而不是[[:space:]] ，几乎没有工具接受\\d而不是[[:digit:]] （或[0-9] ）。

使用sed | awk和group进行替换的正则表达式

问题描述

3 个解决方案

解决方案1
1 已采纳 2015-04-15 23:17:51

解决方案2
0 2015-04-16 08:11:15

解决方案3
0 2015-04-16 12:55:30

使用sed | awk和group进行替换的正则表达式

问题描述

3 个解决方案

解决方案1 1 已采纳 2015-04-15 23:17:51

解决方案2 0 2015-04-16 08:11:15

解决方案3 0 2015-04-16 12:55:30

解决方案1
1 已采纳 2015-04-15 23:17:51

解决方案2
0 2015-04-16 08:11:15

解决方案3
0 2015-04-16 12:55:30