简体   繁体   English

使用sed,awk等分隔中间点字符

[英]Using sed, awk, etc. to separate after middle dot characters

I could use your assistance for something; 我可以用你的帮助; I promise I tried really hard to search for answers, but no luck. 我保证我努力寻找答案,但没有运气。

I want to separate text between every occurrence of the "·" (middle dot) character (by syllables, basically). 我想在每次出现“·”(中间点)字符(基本上是音节)之间分隔文本。

echo con·grat·u·late | sed -e 's/·.*$/·/1'

The code above outputs: 上面的代码输出:

con· CON·

That is the first part of what I want, but ultimately I would like an output of: 这是我想要的第一部分,但最终我想要一个输出:

con· CON·
grat· GRAT·

late 晚的

This will involve getting the characters between the 1st-2nd, and the 2nd-3rd occurrences of "·" 这将涉及获取“·”的第1至第2次和第2次至第3次之间的字符

If anyone can guide me in the right direction, I will really appreciate it, and I will figure the rest out on my own. 如果有人能指引我朝着正确的方向前进,我将非常感激,并且我会自己完成其余的工作。

EDIT My apologies, I displayed my desired output incorrectly. 编辑道歉,我错误地显示了我想要的输出。 Your solution's worked great, however. 但是,您的解决方案非常有用。

Since it is important for me to keep everything as a single line, how would I output the text between the first dot and the second one, to output: 由于将所有内容保存为单行非常重要,如何在第一个点和第二个点之间输出文本,以输出:

grat·

I am doing it in UTF-8, Jonathan 我在UTF-8,Jonathan这样做

Once again, sorry for asking the wrong thing. 再一次,抱歉找错了。

In GNU sed you can do this: 在GNU sed中,您可以这样做:

echo con·grat·u·late | sed -e 's/·/&\n/g'

The & stands for the matched pattern, in this example the · . &代表匹配的模式,在这个例子中是· Unfortunately this doesn't work in BSD sed. 不幸的是,这在BSD sed中不起作用。

For a more portable solution, I recommend this AWK, which should work in both GNU and BSD systems: 对于更便携的解决方案,我推荐这个AWK,它应该适用于GNU和BSD系统:

echo con·grat·u·late | awk '{ gsub("·", "&\n") } 1'

Since you are looking to run characters between the dots, You can try sed like this 由于您希望在点之间运行字符,因此您可以像这样尝试sed

echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed  -n 2p|tr -d '.'

to print group of characters between 1st and 2nd dot 在第1和第2点之间打印字符组

echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed  -n 2p|tr -d '.'

results 结果

grat

note: I use 2p to print characters between 1st dot and 2nd dot 注意:我使用2p在第一个点和第二个点之间打印字符

print group of characters between 2nd dot and 3rd 打印第2点和第3点之间的字符组

echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed  -n 3p|tr -d '.'

results 结果

u

note: I use 3p to print characters between 2nd dot and 3rd dot 注意:我使用3p在第二个点和第三个点之间打印字符

You can also do the whole thing with sed but I use tr command so it will be easy for you to follow. 你也可以使用sed完成整个过程,但是我使用tr命令,这样你就可以轻松地遵循了。 The tr command delete the dots before printing. tr命令在打印前删除点。 If you want to dots then exclude |tr -d '.' 如果你想要点,那么排除|tr -d '.' from your command line. 从命令行。

You can also print ranges of group of characters 您还可以打印一组字符

echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed  -n 1,3p|tr -d '.'

results 结果

con
grat
u

You can use simple awk to get these words separated: 您可以使用简单的awk将这些单词分开:

$ echo 'con.grat.u.late' | awk -F. '{print $1}'
con
$ echo 'con.grat.u.late' | awk -F. '{print $2}'
grat
$ echo 'con.grat.u.late' | awk -F. '{print $3}'
u
$ echo 'con.grat.u.late' | awk -F. '{print $4}'
late

$ echo 'con.grat.u.late' | awk -F. '{for(i=1;i<=NF;i++){print $i}}' 
con
grat
u
late

-F. implies use . 暗示使用. as field separator 作为字段分隔符

Simply 只是

echo con·grat·u·late | sed -e 's/·/·\n/g'

that replaces every · with a · followed by a newline. 替换每··跟着一个换行符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM