简体   繁体   English

如何使用sed或awk处理指定列的内容?

[英]How to use sed or awk to process the content of a specified column?

I have such data in text. 我的文字中有此类数据。

2017-08-07 733 AA1(10.7.21.51) AllUsers 631 K:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 631(Peter) 1:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208(Lucy) 2:C
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 K:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 99999(Kate) 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631(Peter) 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631(Peter) 2:C
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999(Kate) T:U 
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999(Kate) 3:U 

There are 6 columns in this text.Use space to separate each column of data. 本文共有6列。请使用空格分隔数据的每一列。

I want to deal with the 5th column of data. 我想处理第五列数据。

Some data contains names in the 5th column. 某些数据在第5列中包含名称。 There are more parentheses outside the data containing names. 包含名称的数据之外还有更多的括号。 There are only numbers in the columns without names. 列中只有数字,没有名称。 This number is the employee number. 此号码是员工号码。 I just want to take out the numbers in the 5th column, not name. 我只想取出第5列中的数字,而不是名称。 I want this effect. 我想要这种效果。

2017-08-07 733 AA1(10.7.21.51) AllUsers 631 K:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 2:C
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 K:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 99999 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631 2:C
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999 T:U 
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999 3:U 

I use this command to process data. 我使用此命令来处理数据。

 cat mytextfile|sed 's/(/ /g' > resultfile 

But the 3rd column has also been modified, because the 3rd column also contains parentheses. 但是第三列也已修改,因为第三列也包含括号。 In fact, I just want to process the 5th column. 实际上,我只想处理第5列。

What should I do with sed or awk? sed或awk应该怎么办?

Using sed (simple) 使用sed(简单)

To remove all parens that contain only letters, try: 要删除所有仅包含字母的括号,请尝试:

$ sed 's/([[:alpha:]]*)//' myfile
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 K:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 2:C
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 K:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 99999 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631 2:C
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999 T:U 
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999 3:U 

([[:alpha:]]*) matches ( followed by zero or more alphabetic characters followed by ) . ([[:alpha:]]*)匹配(后跟零个或多个字母字符,后跟) s/([[:alpha:]]*)// locates those characters and replaces them with an empty string. s/([[:alpha:]]*)//找到这些字符并将其替换为空字符串。

Using sed (improved) 使用sed(改进)

This removes parenthesized expressions of alphabetic characters from the fifth field and only the fifth field: 这将从第五个字段和仅第五个字段中删除带括号的字母字符表达式:

$ sed -E 's/(([^[:space:]]+[[:space:]]+){4}[^[:space:]]*)\([[:alpha:]]*\)/\1/' myfile
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 K:N
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 2:C
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 K:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 99999 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631 2:C
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999 T:U
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999 3:U

Using awk 使用awk

To remove any parenthesized expression in the fifth field: 要删除第五个字段中的任何带括号的表达式,请执行以下操作:

$ awk -F'[[:space:]]+' '{gsub(/\(.*\)/, "", $5)} 1' myfile
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 K:N
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 2:C
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 K:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 99999 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631 2:C
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999 T:U
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999 3:U

How it works: 这个怎么运作:

  1. -F'[[:space:]]+'

    This tells awk to use any sequence of unicode-recognized whitespace as the field separator. 这告诉awk使用unicode识别的任何空白序列作为字段分隔符。 (The default is only to recognize sequences of blank, tab, and newline as field separators.) (默认设置是仅将空白,制表符和换行符的序列识别为字段分隔符。)

  2. gsub(/\\(.*\\)/, "", $5)

    This looks in the fifth field, $5 , for any parenthesized expression, \\(.*\\) , and replaces it with the empty string "" . 它在第五个字段$5中查找任何带括号的表达式\\(.*\\) ,并将其替换为空字符串""

  3. 1

    This is shorthand which tells awk to print the line. 这是告诉awk打印行的简写。

In your example's specific case, the user names follow only digits, which is different than in column 3, where the IP address in parentheses follows letters and digits. 在您的示例的特定情况下,用户名仅跟随数字,这与第3列不同,在第3列中,括号中的IP地址跟随字母和数字。 You can leverage this to your advantage: 您可以利用此优势:

$ sed 's/\( [0-9][0-9]*\)([^)]*)/\1/g' mytextfile 
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 K:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 631 1:N 
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 2:C
2017-08-07 733 AA1(10.7.21.51) AllUsers 2208 K:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 99999 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631 2:C
2017-08-07 189 AA2(10.7.4.54) AllUsers 631 2:C
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999 T:U 
2017-08-07 733 AA3(10.7.21.51) AllUsers 99999 3:U 

That sed line says to capture a space followed by one or more numbers, followed by parentheses with anything inside, replacing all of that with the value captured. sed行表示要捕获一个空格,后跟一个或多个数字,然后在括号内加上任何内容,并用捕获的值替换所有空格。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM