Bash排序CSV大文件並將輸出排序到單獨的文件

Question

我有一個大的（4GB）以分號分隔的文件（ 1.txt ）：

 - "3321";"<a href='/files/goods/edit/647/'><u>[ID 647]</u></a> Шорты";"2015-09-06 18:39:17";"1590";"1";"500";"";"Лейла";"878785";"Да";"80.140.1.38"
 - "2780";"<a href='/files/goods/edit/647/'><u>[ID 647]</u></a> Шорты";"2015-09-06 18:42:51";"1590";"1";"500";"";"Мара";"8664456";"Да";"46.00.00.2"
 - "3352";"<a href='/files/goods/edit/698/'><u>[ID 698]</u></a> Deck";"2015-09-06 19:05:42";"990";"1";"400";"";"Ed";"456452";"Нет";"80.26.00.00"
 - "3764";"<a href='/files/goods/edit/669/'><u>[ID 669]</u></a> Fish";"2015-09-06 18:36:18";"1390";"1";"530";"";"Ann";"545566";"Нет";"80.00.35.90"
 - "3323";"<a href='/files/goods/edit/669/'><u>[ID 669]</u></a> Fish";"2015-09-06 18:54:18";"1390";"1";"530";"";"юрий";"99393";"Да";"85.141.00.100"
 - "32763";"<a href='/files/goods/edit/430/'><u>[ID 430]</u></a> Radio";"2015-09-06

我需要按第二列對1.txt排序，然后根據第二列名稱將所有結果輸出到單獨的文件中。

我這樣做：

sed -r -i -e 's#"<a href=\x27\/files\/goods\/edit\/##g' 1.txt | sed -r -i -e 's#\/\x27>#;#g' 1.txt | sort --field-separator=';' --key=2 1.txt

但是，現在如何拆分1.txt文件並將所有相同的ID（第二列）值行放在單獨的文件中並計算文件中的記錄？ 要具有類似647_count.txt ， 698_count.txt ， 669_count.txt和430_count.txt 。

Answer 1

嘗試以下awk腳本（我們稱其為parser.awk ）：

BEGIN { FS=";"; }   # field separator
{ 
    if (match($2, /[0-9]+/)) {           # matching `ID` value
        m=substr($2, RSTART, RLENGTH);
        a[m]++;                          # accumulating number of lines for each `ID`
        print > m"_count.txt";    # writing lines pertaining to certain `ID` into respective file
    } 
}
END {
    for(i in a) { 
        print "mv "i"_count.txt "i"_"a[i]".txt"  # renaming files with actual counts
    }
}

用法：

awk -f parser.awk 1.csv | sh

對於您在問題中發布的輸入片段，我已獲得以下文件列表：

430_1.txt 
647_2.txt 
669_2.txt
698_1.txt

Answer 2

擊：

err() { echo "$@" >&2; return 1; }

#the line sorting
re='^[^;]*;[^;]*ID ([0-9][0-9]*)'
n=0
while read -r line
do
    let n++
    if [[ "$line" =~ $re ]]
    then
        echo "$line" >> "${BASH_REMATCH[1]}_COUNT.csv"
    else
        err "$n-th line [$line] doesn't match"
    fi

done

#rename the ID_COUNT.csv to the real value of lines
shopt -s nullglob
for file in [0-9][0-9]*_COUNT.csv
do
    mv -n "$file" "${file//_COUNT/_$(grep -c '^' "$file")}"
done

Bash排序CSV大文件並將輸出排序到單獨的文件

問題描述

2 個解決方案

解決方案1
2 已采納 2017-03-11 20:35:55

解決方案2
2 2017-03-11 21:53:45

Bash排序CSV大文件並將輸出排序到單獨的文件

問題描述

2 個解決方案

解決方案1 2 已采納 2017-03-11 20:35:55

解決方案2 2 2017-03-11 21:53:45

解決方案1
2 已采納 2017-03-11 20:35:55

解決方案2
2 2017-03-11 21:53:45