使用 UNIX/Linux 操作（排除行）a.csv 文件的特定列

Question

我想訪問和操作 csv 文件的第四列。特別是我想排除不符合特定要求的行（排除沒有 3 個字符國家代碼的行）。

我的數據集：

Luxembourg,LUX,2017,9294689.12
Aruba,ABW,2017,927865.82
Nepal,NPL,2017,9028196.37
Bangladesh,BGD,2017,88057460.51
Costa Rica,CRI,2017,8695008.05
Chile,CHL,2017,84603249.72
Cook Islands,COK,2017,82045.41
World,OWIDWRL,1755,9361520
India,INDIA,1763,0
Asia and Pacific (other),,2017,5071156099
World,OWID_WRL,1752,9354192
Middle East,,1751,0
International transport,,1751,0
India,IND,1751,0
Europe (other),,1751,0
China,CHN,1751,0
Asia and Pacific (other),,1751,0
Americas (other),,1751,0
Africa,,1751,0

提前致謝。

我已經按年對數據文件進行了排序，但我不知道如何訪問第 4 列並使用 awk 或 sed。

預期數據集：

Luxembourg,LUX,2017,9294689.12
Aruba,ABW,2017,927865.82
Nepal,NPL,2017,9028196.37
Bangladesh,BGD,2017,88057460.51
Costa Rica,CRI,2017,8695008.05
Chile,CHL,2017,84603249.72
Cook Islands,COK,2017,82045.41

Answer 1

如果我正確地回答了您的問題，請您嘗試以下操作。 如果任何行的第二個字段中沒有確切的 3 個字符，則代碼在哪里查看，則不要打印該行。

awk 'BEGIN{FS=","} $2~/^[a-zA-Z]{3}$/' Input_file

如果您有 OLD awk范圍{3}不起作用，請嘗試。

awk 'BEGIN{FS=","} $2~/^[a-zA-Z][a-zA-Z][a-zA-Z]$/' Input_file

說明：在此處添加對上述代碼的說明。

awk '                  ##Starting awk program here.
BEGIN{                 ##Starting BEGIN section from here. Which will be executed before Input_file is being read
  FS=","               ##Setting field separator as comma here.
}                      ##Closing BEGIN section here.
$2~/^[a-zA-Z]{3}$/     ##Checking condition if 2nd field is starting with alphabets 3 occurrence of it and ending with it too.
                       ##Since awk works on method of condition then action; so if condition is TRUE then perform certain action.
                       ##In this case no action given so  by default print of line will happen.
' Input_file           ##Mentioning Input_file name here.

Answer 2

以下將 output 僅在第二個字段中具有 3 個字母值的行：

awk --re-interval -F, 'tolower($2) ~ /^[a-z]{3}$/' country.txt

也可以檢查長度，但這確保只提供 3 個字母。

--re-internval允許您在 RE 中使用 itnernval 表達式，因為大括號是 awk 中的保留字符。

-F,告訴 awk 輸入分隔符是逗號。

print是 awk 中的默認操作，所以tolower($2) ~ /^[az]{3}$/是tolower($2) ~ /^[az]{3}$/ {print}的簡寫方式

tolower($2)將第二個字段的值小寫， ~是正則表達式比較運算符，我們用它來檢查字符串的開頭^ ，然后[az]重復{3}次和字符串的結尾$ .

使用 UNIX/Linux 操作（排除行）a.csv 文件的特定列

問題描述

2 個解決方案

解決方案1
2 已采納 2019-10-31 19:15:28

解決方案2
1 2019-10-31 19:23:03

使用 UNIX/Linux 操作（排除行）a.csv 文件的特定列

問題描述

2 個解決方案

解決方案1 2 已采納 2019-10-31 19:15:28

解決方案2 1 2019-10-31 19:23:03

解決方案1
2 已采納 2019-10-31 19:15:28

解決方案2
1 2019-10-31 19:23:03