使用 sed 或 awk 从字符串中删除前导和尾随数字，同时保留 2 个数字

Question

I have a file containing lines like:我有一个包含以下行的文件：

353451word2423157
anotherword
7412yetanother1
3262andherese123anotherline4359013
5342512354325324523andherese123anotherline45913
532453andherese123anotherline413

I'd like to strip most of the leading and tailing numbers (0-9), while still leaving 2 leading and trailing numbers in place, if any...我想去掉大部分前导和尾随数字（0-9），同时仍然保留 2 个前导和尾随数字，如果有的话......

To clarify, for the list above, the expected output would be:澄清一下，对于上面的列表，预期的输出是：

51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41

Preferred tools would be sed or awk, but any other suggestions are welcome...首选工具是 sed 或 awk，但欢迎任何其他建议...

I've tried something like sed 's/[0-9]\\+$//' | sed 's/^[0-9]\\+//'我试过类似sed 's/[0-9]\\+$//' | sed 's/^[0-9]\\+//' sed 's/[0-9]\\+$//' | sed 's/^[0-9]\\+//' , but obviously this strips all leading and trailing numbers... sed 's/[0-9]\\+$//' | sed 's/^[0-9]\\+//' ，但显然这会去除所有前导和尾随数字......

Answer 1

You may try this sed :你可以试试这个sed ：

sed -E 's/^[0-9]+([0-9]{2})|([0-9]{2})[0-9]+$/\1\2/g' file

51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41

Command Details:命令详情：

^[0-9]+([0-9]{2}) : Match 1+ digits at start if that is followed by 2 digits (captured in a group) and replace with 2 digits in group #1. ^[0-9]+([0-9]{2}) ：如果后面是 2 个数字（在一个组中捕获），则匹配开头的 1+ 个数字，并替换为第 1 组中的 2 个数字。
([0-9]{2})[0-9]+$ : Match 1+ digits at the end if that is preceded by 2 digits (captured in a group) and replace with 2 digits in group #2. ([0-9]{2})[0-9]+$ ：如果前面有 2 位数字（在一组中捕获），则匹配末尾的 1+ 位数字，并替换为组 #2 中的 2 位数字。

Answer 2

I suggest using perl :我建议使用perl ：

perl -pe 's/^\d+(?=\d{2})|(\d{2})\d+$/$1/' file

See the online demo and the regex demo .请参阅在线演示和正则表达式演示。

Regex details :正则表达式详细信息：

^ - start of string ^ - 字符串的开始
\\d+ - one or more digits \\d+ - 一位或多位数字
(?=\\d{2}) - on the right, there must be two digits (not added to the match as the lookahead is a non-consuming pattern) (?=\\d{2}) - 在右边，必须有两个数字（没有添加到匹配中，因为前瞻是一个非消耗模式）
| - or - 或者
(\\d{2}) - two digits captured into Group 1 ( $1 ) (\\d{2}) - 捕获到组 1 ( $1 ) 中的两位数字
\\d+ - one or more digits \\d+ - 一位或多位数字
$ - end of string. $ - 字符串的结尾。

Answer 3

Here is an awk that trims digits to a max of 2 on each side of a string:这是一个 awk，它在字符串的每一侧将数字修剪为最多 2 个：

awk '{  match($0, /^[0-9]*/); lh=RLENGTH
        s=substr($0, lh>2 ? lh-1 : 1)
        match(s, /[0-9]*$/); rh=RLENGTH
        print substr(s, 1, rh>2 ? length(s)-rh+2 : length(s))
}' file

Prints:印刷：

51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41

Answer 4

Using GNU awk gensub() function with parentheses in the regexp to mark the components and then specifying them in the replacement (here "\\\\2\\\\3" )在正则表达式中使用带括号的 GNU awk gensub()函数来标记组件，然后在替换中指定它们（此处为"\\\\2\\\\3" ）

awk '{print gensub(/^([[:digit:]]*)([[:digit:]]{2})|([[:digit:]]{2})([[:digit:]]*)$/,"\\2\\3","g",$0)}' file
51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41

Answer 5

I would use GNU AWK following way, let file.txt content be我会按照以下方式使用 GNU AWK ，让file.txt内容为

353451word2423157
anotherword
7412yetanother1
3262andherese123anotherline4359013
5342512354325324523andherese123anotherline45913
532453andherese123anotherline413

then然后

awk 'BEGIN{FPAT="[0-9]+|[^0-9]+";OFS=""}$1~/[0-9]+/{$1=substr($1,length($1)-1)}$NF~/[0-9]+/{$NF=substr($NF,1,2)}{print}' file.txt

output输出

51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41

Explanation: I instruct GNU AWK to split into fields which consist solely of digits or solely of non-digits using FPAT .说明：我指示 GNU AWK使用FPAT拆分为仅由数字组成或仅由非数字组成的FPAT 。 If 1st column ( $1 ) consist of digits, I slice it to get 2 last characters.如果第一列 ( $1 ) 由数字组成，我将其切片以获得 2 个最后一个字符。 If last column ( $NF ) consist solely of digits, I slice it to get 2 first characters.如果最后一列 ( $NF ) 仅由数字组成，我会将其切片以获取 2 个第一个字符。 Finally whole line is print ed using empty string as output field seperator ( OFS ).最后使用空字符串作为输出字段分隔符（ OFS ） print整行。

(tested in gawk 4.2.1) （在 gawk 4.2.1 中测试）

使用 sed 或 awk 从字符串中删除前导和尾随数字，同时保留 2 个数字

问题描述

5 个解决方案

解决方案1
8 已采纳 2021-06-29 14:56:31

解决方案2
1 2021-06-29 14:52:06

解决方案3
1 2021-06-29 19:56:58

解决方案4
0 2021-06-30 06:26:05

解决方案5
0 2021-06-30 07:28:01

使用 sed 或 awk 从字符串中删除前导和尾随数字，同时保留 2 个数字

问题描述

5 个解决方案

解决方案1 8 已采纳 2021-06-29 14:56:31

解决方案2 1 2021-06-29 14:52:06

解决方案3 1 2021-06-29 19:56:58

解决方案4 0 2021-06-30 06:26:05

解决方案5 0 2021-06-30 07:28:01

解决方案1
8 已采纳 2021-06-29 14:56:31

解决方案2
1 2021-06-29 14:52:06

解决方案3
1 2021-06-29 19:56:58

解决方案4
0 2021-06-30 06:26:05

解决方案5
0 2021-06-30 07:28:01