简体   繁体   English

如何在 linux 的命令行上使用正则表达式过滤文本文件中以大写字母开头并以正 integer 结尾的行?

[英]How do I filter lines in a text file that start with a capital letter and end with a positive integer with regex on the command line in linux?

I am attempting to use Regex with the grep command in the linux terminal in order to filter lines in a text file that start with Capital letter and end with a positive integer.我试图在 linux 终端中将正则表达式与 grep 命令一起使用,以便过滤文本文件中以大写字母开头并以正 integer 结尾的行。 Is there a way to modify my command so that it does this all in one line with one call of grep instead of two?有没有办法修改我的命令,以便通过一次调用 grep 而不是两次来完成这一切? I am using windows subsystem for linux and the microsoft store ubuntu.我正在为 linux 和微软商店 ubuntu 使用 windows 子系统。

Text File:文本文件:

C line 1
c line 2
B line 3
d line 4
E line five

The command that I have gotten to work:我已经开始工作的命令:

grep ^[A-Z] cap*| grep [0-9]$ cap*

The Output Output

C line 1
B line 3

This works but i feel like the regex statement could be combined somehow but这行得通,但我觉得正则表达式语句可以以某种方式组合,但是

grep ^[A-Z][0-9]$ 

does not yield the same result as the command above.不会产生与上述命令相同的结果。

You need to use你需要使用

grep '^[A-Z].*[0-9]$'
grep '^[[:upper:]].*[0-9]$'

See the online demo .请参阅在线演示 The regex matches:正则表达式匹配:

  • ^ - start of string ^ - 字符串的开头
  • [AZ] / [[:upper:]] - an uppercase letter [AZ] / [[:upper:]] - 大写字母
  • .* - any zero or more chars ( [^0-9]* matches zero or more non-digit chars) .* - 任何零个或多个字符( [^0-9]*匹配零个或多个非数字字符)
  • [0-9] - a digit. [0-9] - 一个数字。
  • $ - end of string. $ - 字符串结束。

Also, if you want to make sure there is no - before the number at the end of string, you need to use a negated bracket expression, like另外,如果你想确保没有-在字符串末尾的数字之前,你需要使用一个否定的括号表达式,比如

grep -E '^[[:upper:]].*[^-0-9][0-9]+$'

Here, the POSIX ERE regx (due to -E option) matches在这里,POSIX ERE regx(由于-E选项)匹配

  • ^[[:upper:]].* - an uppercase letter at the start and then any text, ^[[:upper:]].* - 开头的大写字母,然后是任何文本,
  • [^-0-9] - any char other than a digit and - [^-0-9] - 数字以外的任何字符和-
  • [0-9]+ - one or more digits [0-9]+ - 一位或多位数字
  • $ - end of strng. $ - 字符串结束。

When you use a pipeline, you want the second grep to act on standard input, not on the file you originally grepped from.当您使用管道时,您希望第二个grep作用于标准输入,而不是作用于您最初从中获取的文件。

grep ^[A-Z] cap*| grep [0-9]$

However, you need to expand the second regex if you want to exclude negative numbers.但是,如果要排除负数,则需要扩展第二个正则表达式。 Anyway, a better solution altogether might be to switch to Awk:无论如何,一个更好的解决方案可能是切换到 Awk:

awk '/^[A-Z]/ && /[0-9]$/ && $NF > 0' cap*

The output format will be slightly different than from grep ; output 格式将与grep略有不同; if you want to include the name of the matching file, you have to specify that separately:如果要包含匹配文件的名称,则必须单独指定:

awk '/^[A-Z]/ && /[0-9]$/ && $NF > 0 { print FILENAME ":" $0 }' cap*

The regex ^[AZ][0-9]$ matches exactly two characters, the first of which must be an alphabetic, and the second one has to be a number.正则表达式^[AZ][0-9]$恰好匹配两个字符,第一个字符必须是字母,第二个字符必须是数字。 If you want to permit arbitrary text between them, that would be ^[AZ].*[0-9]$ (and for less arbitrary, use something a bit more specific than .* , like (.*[^-0-9])? perhaps, where you need grep -E for the parentheses and the question mark for optional, or backslashes before each of these for the BRE regex dialect you get out of the box with POSIX grep ).如果您想在它们之间允许任意文本,那将是^[AZ].*[0-9]$ (并且为了不那么随意,请使用比.*更具体的东西,例如(.*[^-0-9])?也许,您需要grep -E作为括号和问号作为可选,或者在每个这些之前的反斜杠对于您使用 POSIX grep开箱即用的 BRE 正则表达式方言。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM