[英]Using sed, awk, etc. to separate after middle dot characters
I could use your assistance for something; 我可以用你的帮助; I promise I tried really hard to search for answers, but no luck. 我保证我努力寻找答案,但没有运气。
I want to separate text between every occurrence of the "·" (middle dot) character (by syllables, basically). 我想在每次出现“·”(中间点)字符(基本上是音节)之间分隔文本。
echo con·grat·u·late | sed -e 's/·.*$/·/1'
The code above outputs: 上面的代码输出:
con· CON·
That is the first part of what I want, but ultimately I would like an output of: 这是我想要的第一部分,但最终我想要一个输出:
con· CON·
grat· GRAT·
u· U·
late 晚的
This will involve getting the characters between the 1st-2nd, and the 2nd-3rd occurrences of "·" 这将涉及获取“·”的第1至第2次和第2次至第3次之间的字符
If anyone can guide me in the right direction, I will really appreciate it, and I will figure the rest out on my own. 如果有人能指引我朝着正确的方向前进,我将非常感激,并且我会自己完成其余的工作。
EDIT My apologies, I displayed my desired output incorrectly. 编辑道歉,我错误地显示了我想要的输出。 Your solution's worked great, however. 但是,您的解决方案非常有用。
Since it is important for me to keep everything as a single line, how would I output the text between the first dot and the second one, to output: 由于将所有内容保存为单行非常重要,如何在第一个点和第二个点之间输出文本,以输出:
grat·
I am doing it in UTF-8, Jonathan 我在UTF-8,Jonathan这样做
Once again, sorry for asking the wrong thing. 再一次,抱歉找错了。
In GNU sed you can do this: 在GNU sed中,您可以这样做:
echo con·grat·u·late | sed -e 's/·/&\n/g'
The &
stands for the matched pattern, in this example the ·
. &
代表匹配的模式,在这个例子中是·
。 Unfortunately this doesn't work in BSD sed. 不幸的是,这在BSD sed中不起作用。
For a more portable solution, I recommend this AWK, which should work in both GNU and BSD systems: 对于更便携的解决方案,我推荐这个AWK,它应该适用于GNU和BSD系统:
echo con·grat·u·late | awk '{ gsub("·", "&\n") } 1'
Since you are looking to run characters between the dots, You can try sed like this 由于您希望在点之间运行字符,因此您可以像这样尝试sed
echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed -n 2p|tr -d '.'
to print group of characters between 1st and 2nd dot 在第1和第2点之间打印字符组
echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed -n 2p|tr -d '.'
results 结果
grat
note: I use 2p
to print characters between 1st dot and 2nd dot 注意:我使用2p
在第一个点和第二个点之间打印字符
print group of characters between 2nd dot and 3rd 打印第2点和第3点之间的字符组
echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed -n 3p|tr -d '.'
results 结果
u
note: I use 3p
to print characters between 2nd dot and 3rd dot 注意:我使用3p
在第二个点和第三个点之间打印字符
You can also do the whole thing with sed but I use tr
command so it will be easy for you to follow. 你也可以使用sed完成整个过程,但是我使用tr
命令,这样你就可以轻松地遵循了。 The tr
command delete the dots before printing. tr
命令在打印前删除点。 If you want to dots then exclude |tr -d '.'
如果你想要点,那么排除|tr -d '.'
from your command line. 从命令行。
You can also print ranges of group of characters 您还可以打印一组字符
echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed -n 1,3p|tr -d '.'
results 结果
con
grat
u
You can use simple awk
to get these words separated: 您可以使用简单的awk
将这些单词分开:
$ echo 'con.grat.u.late' | awk -F. '{print $1}'
con
$ echo 'con.grat.u.late' | awk -F. '{print $2}'
grat
$ echo 'con.grat.u.late' | awk -F. '{print $3}'
u
$ echo 'con.grat.u.late' | awk -F. '{print $4}'
late
$ echo 'con.grat.u.late' | awk -F. '{for(i=1;i<=NF;i++){print $i}}'
con
grat
u
late
-F.
implies use .
暗示使用.
as field separator 作为字段分隔符
Simply 只是
echo con·grat·u·late | sed -e 's/·/·\n/g'
that replaces every ·
with a ·
followed by a newline. 替换每·
有·
跟着一个换行符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.