简体   繁体   English

使用sed在指定位置替换大于指定数的数

[英]Using sed to replace a number greater than a specified number at a specified position

I need to write a script to replace all the numbers greater than an specified number which is in following position.我需要编写一个脚本来替换所有大于以下位置的指定数字的数字。

1499011200 310961583 142550756 313415036 146983209

Here I am writing a script if the second term exceeds in value greater than 300000000. I need the whole line to be replaced by my desired value like如果第二项的值超过 300000000,我在这里编写一个脚本。我需要将整行替换为我想要的值,例如

1499011200 250000000 XXXX XXXX XXXX

I hope I have made my question clear.我希望我已经把我的问题说清楚了。

Thanks in advance提前致谢

This might work for you (GNU sed):这可能对你有用(GNU sed):

sed -r '/^\S+\s+(300000000|[1-2][0-9]{8}|[0-9]{1,8})\s/!c change' file

If it's 300000000 or less keep it, otherwise change it.如果小于或等于300000000 ,则保留它,否则更改它。

Or using substitution:或者使用替换:

sed '/^\S\+\s\+\(300000000\|[1-2][0-9]\{8\}\|[0-9]\{1,8\}\)\s/!s/^\(\S\+\s\+\).*/\1250000000 XXXX XXXX XXXX/' file

This is doable but not simple.这是可行的,但并不简单。 (≥ a number ending is 0's is easier than >.) (≥以 0 结尾的数字比 > 更容易。)

Let's start with a smaller number.让我们从一个较小的数字开始。

How could we match numbers greater than 30?我们如何匹配大于 30 的数字?

  • 2-digit numbers greater than 30 but less than 40,大于 30 但小于 40 的两位数,

     \b3[1-9]\b
  • 2-digit numbers 40 or greater, 40 或更大的两位数,

     \b[4-9][0-9]\b
  • numbers with more digits are greater too.位数更多的数字也更大。

     \b[1-9][0-9]\{2,\}\b

Use alternation to match all the cases.使用交替来匹配所有情况。

\b\(3[1-9]\|[4-9][0-9]\|[0-9]\{3,\}\)\b

300000000 is similar, but more work. 300000000 类似,但工作量更大。 Here I've added spaces for readability, but you'll need to remove them in the sed regex.在这里,为了便于阅读,我添加了空格,但您需要在sed正则表达式中删除它们。

\b \( 30000000[1-9]
   \| 3000000[1-9][0-9]
   \| 300000[1-9][0-9]\{2\}
   \| 30000[1-9][0-9]\{3\}
   \| 3000[1-9][0-9]\{4\}
   \| 300[1-9][0-9]\{5\}
   \| 30[1-9][0-9]\{6\}
   \| 3[1-9][0-9]\{7\}
   \| [4-9][0-9]\{8\}
   \| [1-9][0-9]\{9\}
\) \b

In awk:在 awk 中:

$ awk '$2>300000000{for(i=3;i<=NF;i++)$i="XXXX"}1' file
1499011200 310961583 XXXX XXXX XXXX

Explained:解释:

$ awk '                 # using awk
$2>300000000 {          # if the second value is greater than ...
    for(i=3;i<=NF;i++)  # for each value aftef the second
        $i="XXXX"       # replace it with XXXX
}1' file                # output

Although it's an old-ish question, it's worth adding that this could also be handled using conditions:虽然这是一个古老的问题,但值得补充的是,这也可以使用条件来处理:

  • FreeBSD/MacOS: FreeBSD/苹果操作系统:
    sed -E '/^[0-9]+ +30{8} /, s/^([0-9]+) +([3-9][0-9]{8,}|[0-9]{10.}).*/\1 250000000 XXXX XXXX XXXX/'
  • Linux: Linux:
    sed -r '/^[0-9]+ +30{8} /, s/^([0-9]+) +([3-9][0-9]{8,}|[0-9]{10.}).*/\1 250000000 XXXX XXXX XXXX/'

Explanation解释

We will handle the strict "greater than" sneakily!我们将偷偷摸摸地处理严格的“大于”!

We prefix the command with a condition that tells sed to only process lines which do not have 300000000 in the second field.我们在命令前加上一个条件,告诉sed处理第二个字段中没有300000000 的行。 That means we don't have to worry about matching 300000001 or 300010000 but not 300000000. If a line passes this condition, then (and only then!) we will go ahead and replace any number followed by 300000000 or more followed by anything , by the first number (only), followed by " 250000000 XXXX XXXX XXXX" .这意味着我们不必担心匹配 300000001 或 300010000 而不是 300000000。如果一条线通过这个条件,那么(并且只有那时!)我们将继续并替换any number followed by 300000000 or more followed by anything ,通过the first number (only), followed by " 250000000 XXXX XXXX XXXX"

In other words:换一种说法:

If the 2nd field is exactly 300000000 the condition means nothing will happen.如果第二个字段正好是 300000000,则条件意味着什么都不会发生。 OTHERWISE if it's less than 300000000 then it won't match the regex "find" part so again nothing will happen, OTHERWISE it will do a replace.否则,如果它小于 300000000,那么它将与正则表达式“查找”部分不匹配,因此什么也不会发生,否则它将进行替换。

Switches:开关:

-E / -r tells sed to use modern regex. -E / -r告诉sed使用现代正则表达式。 The letter differs between different versions of *nix, so it could be something else.这封信在不同版本的 *nix 之间有所不同,所以它可能是别的东西。 These are the two most common letters for this option.这是此选项最常用的两个字母。 See man sed to check what you need on your system.查看man sed以检查您的系统需要什么。

Condition:健康)状况:

This is easy.这很容易。 The line will be processed if:如果出现以下情况,将处理该行:

  • ^ from the start of the line.... ^从行的开头....
  • [0-9]+ + some number >1 of numeric characters followed by some number >1 of spaces (your first field and the column spacing)... [0-9]+ +一些大于 1 的数字字符,后跟大于 1 个的空格(您的第一个字段和列间距)...
    followed by:其次是:
  • 30{8} 3 followed by exactly 8 zeros followed by a space. 30{8} 3 后跟恰好 8 个零,后跟一个空格。 We need the space otherwise it would match, eg, 300000000500 as well.我们需要空间,否则它也会匹配,例如 300000000500。
  • /! The !! after the end of the condition means "only process the command if this condition isn't met.条件结束后表示“仅在不满足此条件时才处理命令。

If a line matches this condition, then we have a line with exactly 300000000 in the second field, and sed will always leave the line unchanged.如果一行符合这个条件,那么我们在第二个字段中就有一行正好是 300000000,并且sed将始终保持该行不变。 If not, it will try to find a match and replace it....如果没有,它将尝试找到一个匹配项并替换它....

Regex replace command:正则表达式替换命令:

This command only gets executed if the second field is not exactly 300000000, because of the condition above.由于上述条件,只有在第二个字段正好是 300000000 时才会执行此命令。 So we can assume that's already checked and look at the replace action if it didn't contain exactly 300000000 in the second field:所以我们可以假设已经检查并查看替换操作,如果它在第二个字段中不包含正好 300000000:

  • s do a find/replace....s查找/替换....
    match and replace this expression, if it's found in the line (otherwise do nothing):匹配并替换此表达式,如果在行中找到它(否则什么也不做):
  • ^([0-9]+) + find start of line followed by any number >1 of digits, followed by any number >1 of spaces. ^([0-9]+) +查找行首,后跟任意数量 >1 的数字,然后是任意数量 >1 的空格。 This is the contents of the first field.这是第一个字段的内容。 The (...) is a grouping that tells regex to remember the part of the matched text it contains - which will be the first field - to potentially be re-used in the replacement operation. (...)是一个分组,它告诉正则表达式记住它包含的匹配文本部分——这将是第一个字段——有可能在替换操作中重新使用。 (We want to include the first field's value in the changed line, if the match succeeds). (如果匹配成功,我们希望在更改的行中包含第一个字段的值)。 This must also be followed by ...之后还必须...
  • ([3-9][0-9]{8,}|[0-9]{10,}).* Match a second field that contains EITHER 3-9 followed by 8 digits OR any 9+ digit number, ONLY, and then anything else to the end of the line. ([3-9][0-9]{8,}|[0-9]{10,}).*仅匹配包含 3-9 后跟 8 位数字或任何 9 位以上数字的第二个字段,然后是行尾的任何其他内容。 Remember that * is "greedy" and matches all it can, so we don't have to explicitly say "to the end of the line", it will do that anyway.请记住*是“贪婪的”并且会匹配所有可能的内容,因此我们不必明确地说“到行尾”,它无论如何都会这样做。 We also don't need to match the space after the 2nd field, because again, * and + are greedy and will match all the digits they can.我们也不需要匹配第二个字段之后的空格,因为*+是贪婪的,它们会匹配所有可能的数字。 So we're telling sed to match any line that contains "(start of line)(number)(spaces)(number >= 300000000)(anything)", and remember the first number.所以我们告诉sed匹配任何包含“(行首)(数字)(空格)(数字 >= 300000000)(任何东西)”的行,并记住第一个数字。 Although the pattern could in theory match and replace the exact value 300000000, it never will , because we excluded that possibility with a condition beforehand.尽管该模式在理论上可以匹配并替换精确值 300000000,但它永远不会,因为我们事先用条件排除了这种可能性。 Also note that we need the .* at the end, because sed only replaces what it matches - if we left it out, it wouldn't replace the rest of the line, it would only replace the text that it actually matched - the first and second fields - which isn't what we want.另请注意,我们最后需要.* ,因为sed仅替换它匹配的内容 - 如果我们将其遗漏,它不会替换该行的其余部分,它只会替换它实际匹配的文本 - 第一个第二个字段——这不是我们想要的。
    If the line matches that expression, then replace the text that was matched (which will be the whole line), with:如果该行与该表达式匹配,则将匹配的文本(将是整行)替换为:
  • \1 250000000 XXXX XXXX XXXX The \1 in the replacement string is a "back reference". \1 250000000 XXXX XXXX XXXX替换字符串中的\1是“反向引用”。 It means, "put the contents of the first matched group here".这意味着,“将第一个匹配组的内容放在这里”。 So this tells sed to replace the entire line (because that's what it matched) by the contents of the first field, followed by a space, followed by "250000000 XXXX XXXX XXXX".所以这告诉sed用第一个字段的内容替换整行(因为那是它匹配的内容),然后是一个空格,然后是“250000000 XXXX XXXX XXXX”。

For completeness, if the line could have leading spaces, the command would then be:为了完整起见,如果该行可以有前导空格,那么命令将是:

sed -E '/^ *[0-9]+ +30{8} /, s/^( *[0-9]+) +([3-9][0-9]{8,}|[0-9]{10.}).*/\1 250000000 XXXX XXXX XXXX/'

(The leading spaces, if any, are inside the grouping, so that we keep them when we do the replacement, for niceness. Otherwise they'd be lost) (前导空格,如果有的话,分组内,所以我们在进行替换时保留它们,为了美观。否则它们会丢失)

Done.完毕。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM