简体   繁体   English

sed命令如何用逗号解析数字?

[英]How does this sed command parse numbers with commas?

I'm having difficulty understanding a number-parsing sed command I saw in this article : 我很难理解我在本文中看到的数字解析sed命令:

sed -i ':a;s/\B[0-9]\{3\}\>/,&/;ta' numbers.txt

I'm a sed newbie, so this is what I've been able to figure out: 我是sed新手,所以这是我能够弄清楚的:

  • & adds to what's already there rather than substitutes &添加到已经存在的东西而不是替代品
  • the :a; ... ;ta :a; ... ;ta :a; ... ;ta calls the substitution recursively on the line until the search finds no more returns :a; ... ;ta在行上递归调用替换,直到搜索找不到更多返回

Here's what I am hoping folks can explain 这是我希望人们能解释的

  • What does -i do? -i做什么? I can't seem to find it on the man pages though I'm sure it's there. 尽管我确定它在那里,但我似乎在手册页上找不到它。
  • I'm a little fuzzy on what the \\B is accomplishing here? 我对\\B在这里要完成的工作有点不了解? Perhaps it helps with the left-right parsing priority, but I don't see how. 也许它有助于左右解析优先级,但是我不知道如何。 So lastly... 所以最后...
  • Most importantly, why does this execute right to left instead of left to right? 最重要的是,为什么这样做从右向左执行而不是从左向右执行? For example, which part of the command keeps this from doing something like: 1234566778,9 ---> 1234,566,778,9 例如,命令的哪一部分阻止这样做: 1234566778,9 ---> 1234,566,778,9

Bisecting this command: 平分此命令:

sed -i ':a;s/\B[0-9]\{3\}\>/,&/;ta' numbers.txt

-i     # inline editing to save changes in input file
\B     # opposite of \b (word boundary) - to match between words
[0-9]  # match any digit
\{3,\} # match exact 3 digits
\>     # word boundary
&      # use matched pattern in replacement
:a     # start label a
ta     # go back to label a until \B[0-9]\{3\}\> is matches

Yes indeed this sed command starts match/replacement from right most 3 digits and keeps going left till it finds 3 digits. 是的,确实,此sed命令从最右边的3位数字开始匹配/替换,并一直向左移动直到找到3位数字。


Update: However looking at this inefficient sed command in a loop I recommend this much simpler and faster awk instead: 更新:但是,在循环中查看此效率低下的 sed命令时,我建议使用更简单,更快速的awk

awk '/^[0-9]+$/{printf "%\047.f\n", $1}' file
20,130,607,215,015
607,220,701
992,171

Where input file is: 输入文件在哪里:

cat file
20130607215015
607220701
992171

The matching is greedy, ie it matches the leftmost three digits NOT preceded by a word boundary and followed by the word boundary , ie the rightmost three digits. 匹配是贪婪的,即,它匹配没有单词边界的最左边的三个数字, 然后是单词边界的最右边的三个数字。 After inserting the comma, the "goto" makes it match again, but the comma introduced a new word boundary, so the match happens earlier. 插入逗号后,“ goto”使它再次匹配,但是逗号引入了新的单词边界,因此匹配会更早发生。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM