简体   繁体   English

使用 UNIX/Linux 操作(排除行)a.csv 文件的特定列

[英]Manipulate(exclude lines) a specific column of a .csv file, using UNIX/Linux

I want to access and manipulate the fourth column of a csv file.In particular I want to exclude the lines that don't meet a specific requirement ( exclude the lines that do not have 3 character country code).我想访问和操作 csv 文件的第四列。特别是我想排除不符合特定要求的行(排除没有 3 个字符国家代码的行)。

My data set:我的数据集:

Luxembourg,LUX,2017,9294689.12
Aruba,ABW,2017,927865.82
Nepal,NPL,2017,9028196.37
Bangladesh,BGD,2017,88057460.51
Costa Rica,CRI,2017,8695008.05
Chile,CHL,2017,84603249.72
Cook Islands,COK,2017,82045.41
World,OWIDWRL,1755,9361520
India,INDIA,1763,0
Asia and Pacific (other),,2017,5071156099
World,OWID_WRL,1752,9354192
Middle East,,1751,0
International transport,,1751,0
India,IND,1751,0
Europe (other),,1751,0
China,CHN,1751,0
Asia and Pacific (other),,1751,0
Americas (other),,1751,0
Africa,,1751,0

Thanks in advance.提前致谢。

I have already sorted my data file by year, but I don't know how to access the 4th column and use awk or sed.我已经按年对数据文件进行了排序,但我不知道如何访问第 4 列并使用 awk 或 sed。

Expected Data set:预期数据集:

Luxembourg,LUX,2017,9294689.12
Aruba,ABW,2017,927865.82
Nepal,NPL,2017,9028196.37
Bangladesh,BGD,2017,88057460.51
Costa Rica,CRI,2017,8695008.05
Chile,CHL,2017,84603249.72
Cook Islands,COK,2017,82045.41

If I got your question correctly, could you please try following.如果我正确地回答了您的问题,请您尝试以下操作。 Where code looks if 2nd field of any line is NOT having exact 3 characters in it then DO NOT print that line.如果任何行的第二个字段中没有确切的 3 个字符,则代码在哪里查看,则不要打印该行。

awk 'BEGIN{FS=","} $2~/^[a-zA-Z]{3}$/' Input_file

In case you have OLD awk where range {3} doesn't work the try.如果您有 OLD awk范围{3}不起作用,请尝试。

awk 'BEGIN{FS=","} $2~/^[a-zA-Z][a-zA-Z][a-zA-Z]$/' Input_file


Explanation: Adding explanation for above code here.说明:在此处添加对上述代码的说明。

awk '                  ##Starting awk program here.
BEGIN{                 ##Starting BEGIN section from here. Which will be executed before Input_file is being read
  FS=","               ##Setting field separator as comma here.
}                      ##Closing BEGIN section here.
$2~/^[a-zA-Z]{3}$/     ##Checking condition if 2nd field is starting with alphabets 3 occurrence of it and ending with it too.
                       ##Since awk works on method of condition then action; so if condition is TRUE then perform certain action.
                       ##In this case no action given so  by default print of line will happen.
' Input_file           ##Mentioning Input_file name here.

The below would output only lines with a 3 letter value in the second field:以下将 output 仅在第二个字段中具有 3 个字母值的行:

awk --re-interval -F, 'tolower($2) ~ /^[a-z]{3}$/' country.txt

Checking the length is also possible, but this ensures only 3 letters are provided.也可以检查长度,但这确保只提供 3 个字母。

--re-internval allows you to use itnernval expressions in RE's as braces are reserved characters in awk. --re-internval允许您在 RE 中使用 itnernval 表达式,因为大括号是 awk 中的保留字符。

-F, tells awk the input delimiter is comma. -F,告诉 awk 输入分隔符是逗号。

print is the default action in awk, so tolower($2) ~ /^[az]{3}$/ is a shorthand way of saying tolower($2) ~ /^[az]{3}$/ {print} print是 awk 中的默认操作,所以tolower($2) ~ /^[az]{3}$/tolower($2) ~ /^[az]{3}$/ {print}的简写方式

The tolower($2) is lowercasing the value of the second field, and ~ is the regex comparison operator, which we use to check for the beginning of the string ^ , then [az] repeated {3} times and the end of the string $ . tolower($2)将第二个字段的值小写, ~是正则表达式比较运算符,我们用它来检查字符串的开头^ ,然后[az]重复{3}次和字符串的结尾$ .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在UNIX中使用文件中的模式查找行 - How to find lines using patterns in a file in UNIX 如何删除 linux/unix 文件中特定行中间的逗号 - How to remove commas in the middle of a specific line in a file in linux/unix 如何使用python脚本删除CSV文件中多行通用的特定字符串? - How to remove a specific string common in multiple lines in a CSV file using python script? Perl从csv文件中删除以特定字符串开头的多行 - Perl Delete multiple lines from csv file beginning with specific string 按特定列中的一系列值过滤.csv 文件,而不使用 awk 或 sed - Filtering .csv file by a range of values in a specific column WITHOUT using awk or sed 删除换行符(\ n)但排除具有特定正则表达式的行? - Remove newlines (\n) but exclude lines with specific regex? 如何使用Nifi转换csv文件中的列? - How to convert a column in a csv file using Nifi? 从Unix文件中删除与模式匹配的行 - Deleting lines matching a pattern from a Unix file 如何在unix日志文件输出中分割行 - how to split lines in unix log file output 使用正则表达式从日志文件行中提取特定数据,然后与其他行中的时间戳进行比较并生成CSV文件 - extract specific data from log file line with regex then compare with timestamp from other lines and generate csv file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM