创建带有分隔符的解析器脚本

Question

I am trying to convert this input from file.txt我正在尝试从file.txt转换此输入

a,b;c^d"e}
f;g,h!;i8j-

into this output进入这个 output

a,b,c,d,e,f,g,h,i,j

with awk与 awk

The best I did so far is到目前为止我做的最好的是

awk '$1=$1' FS="[;,^}8-]" OFS="." file.txt

how can I escape interpritating " as a special character? " doesn`t work我怎样才能避免将"作为特殊字符解释为？ "不起作用
avoid duplicate ,, in the output and delete the last ,避免在,,中重复，并删除最后一个,

Answer 1

One in awk (not for all awks, tested successfully in gawk, mawk, busybox awk and Macos awk version 20200816, unsuccessfully in Debian's awk version 20121220 aka original-awk. Limitations in locales as well.) awk 中的一个（不适用于所有 awk，在 gawk、mawk、busybox awk 和 Macos awk 版本 20200816 中测试成功，在 Debian 的 awk 版本 20121220 aka original-awk 中测试失败。区域设置的限制也是如此。）

$ awk -v RS="^$" '{      # read whole file in 
    gsub(/[^a-z]+/,",")  # replace all non lowercase alphabet substrings with a comma
    sub(/,$/,"")         # remove trailing comma
}1' file                 # output

Output: Output：

a,b,c,d,e,f,g,h,i,j

Answer 2

Using any POSIX awk and assuming you want any non-alphabetic character to act as a field separator:使用任何 POSIX awk 并假设您希望任何非字母字符充当字段分隔符：

$ awk -F '[^[:alpha:]]+' -v OFS=',' '{printf "%s", p; $1=$1; p=$0} END{sub(OFS"$","",p); print p}' file
a,b,c,d,e,f,g,h,i,j

If you really do just want to use the specific set of characters in your question as the field separators then just change [^[:alpha:]]+ to [;;^}8"-]+如果您真的只想使用问题中的特定字符集作为字段分隔符，则只需将[^[:alpha:]]+更改为[;;^}8"-]+

Answer 3

KISS :亲吻：

$ grep -o '[a-z]' file | paste -sd ',' -
a,b,c,d,e,f,g,h,i,j

Should works on most GNU/Linux , even busybox & freeBSD (the - is then mandatory)应该适用于大多数GNU/Linux ，甚至是busybox和freeBSD （然后-是强制性的）

Answer 4

I would harness GNU AWK for this task following way, let file.txt content be我将按照以下方式利用 GNU AWK完成此任务，让file.txt内容为

a,b;c^d"e} f;g,h!;i8j-

then然后

awk 'BEGIN{FPAT="[a-z]";OFS=","}{$1=$1;print}' file.txt

gives output给出 output

a,b,c,d,e,f,g,h,i,j

Explanation: I inform GNU AWK that field is single lowercase ASCII letter using FPAT , and output field separator ( OFS ) is , , then for each line I do $1=$1 to trigger line rebuild and print line.说明：我使用FPAT通知 GNU AWK字段是单个小写 ASCII 字母，而 output 字段分隔符 ( OFS ) 是, ，然后我对每一行执行$1=$1以触发行重建和print行。

(tested in GNU Awk 5.0.1) （在 GNU Awk 5.0.1 中测试）

Answer 5

If you only want to replace non-letter characters with commas and squeeze repeated commas, tr is your friend:如果你只想用逗号替换非字母字符并压缩重复的逗号， tr是你的朋友：

tr -sc '[:alpha:]' ','

Answer 6

Using gnu-sed replace 1 or more chars other than az with a comma.使用gnu-sed将 az 以外的 1 个或多个字符替换为逗号。 Then remove all leading and trailing comma's然后删除所有前导和尾随逗号

sed -Ez 's/[^a-z]+/,/g; s/^,+|,+$//' file

Output Output

a,b,c,d,e,f,g,h,i,j

Answer 7

If ed is available/acceptable.如果ed可用/可接受。

The script.ed script.ed

%s/[^a-z]/ /g
%s/[[:blank:]]\{1,\}/,/g
g/./;j\
s/,$//
,p
Q

Now run现在运行

ed -s file.txt < script.ed

Answer 8

 echo "${input_data}" |

 mawk 'NF-=_==$NF' FS='[^[:alpha:]]*' OFS=, RS=

a,b,c,d,e,f,g,h,i,j

if there's possibility of leading edge seps, use this instead:如果有前缘 seps 的可能性，请改用它：

echo ']a['

 gawk 'gsub("^,|,$",_,$:(NF=NF))^_' FS='[^[:alpha,]]*' OFS=, RS=

Answer 9

If you are ok with Perl solution, here is an one-liner;如果您对 Perl 解决方案没问题，这里是单行；

perl -ne '$_ =~ s/[^[:alnum:]]//g; print join(",", split//, $_)'

which outputs:输出：

a,b,c,d,ef,g,h,i,8,j

Simply, you are substituting characters that are not alpha-numeric with nothing.简单地说，您是用什么替换不是字母数字的字符。

创建带有分隔符的解析器脚本

问题描述

9 个解决方案

解决方案1
2 2022-12-27 11:53:10

解决方案2
2 2022-12-27 15:21:45

解决方案3
1 2022-12-27 17:02:35

解决方案4
0 2022-12-27 11:42:15

解决方案5
0 2022-12-27 11:43:45

解决方案6
0 2022-12-27 12:50:45

解决方案7
0 2022-12-27 15:36:38

解决方案8
0 2022-12-28 00:19:12

解决方案9
0 2022-12-28 01:01:06

创建带有分隔符的解析器脚本

问题描述

9 个解决方案

解决方案1 2 2022-12-27 11:53:10

解决方案2 2 2022-12-27 15:21:45

解决方案3 1 2022-12-27 17:02:35

解决方案4 0 2022-12-27 11:42:15

解决方案5 0 2022-12-27 11:43:45

解决方案6 0 2022-12-27 12:50:45

解决方案7 0 2022-12-27 15:36:38

解决方案8 0 2022-12-28 00:19:12

解决方案9 0 2022-12-28 01:01:06

解决方案1
2 2022-12-27 11:53:10

解决方案2
2 2022-12-27 15:21:45

解决方案3
1 2022-12-27 17:02:35

解决方案4
0 2022-12-27 11:42:15

解决方案5
0 2022-12-27 11:43:45

解决方案6
0 2022-12-27 12:50:45

解决方案7
0 2022-12-27 15:36:38

解决方案8
0 2022-12-28 00:19:12

解决方案9
0 2022-12-28 01:01:06