從文本文件中提取特定數據 - bash

Question

我有一個巨大的文本文件（約 4.5GB 大小），其中包含約 4800 萬行。 所有行均采用以下語法：

    country01/city01/street01/building01
    country01/city01/street01/building02
    country01/city01/street02/building01
    country01/city01/street02/building02
    country01/city02/street01/building01
    .
    .
    etc...

我試圖找到一種快速的方法來刪除街道名稱和它所擁有的建築物數量。 我使用wc -l選項嘗試了sed和awk的各種組合，但它變得混亂，我肯定錯過了一些東西。

將不勝感激任何幫助！

Answer 1

如果您只需要知道某條街道上的建築物數量，您可以執行以下操作：

$ cut -d'/' -f-3 file | sort | uniq -c

這將為您提供排序的街道列表和旁邊的計數

2 country01/city01/street01
2 country01/city01/street02
1 country01/city02/street01

如果您的列表中可能有重復項，您可以這樣做：

$ sort -u file | cut -d'/' -f-3 | uniq -c

如果您確實有一個巨大的文件可能不適合您的 memory 並且sort需要一點時間，您可以執行以下操作：

$ awk 'BEGIN{FS=SUBSEP="/"}{a[$1,$2,$3]++}END{for(i in a) print a[i],i}' file

或者如果您可能有重復項：

$ awk '($0 in a){next}{print; a[$0]}' file | awk 'BEGIN{FS=SUBSEP="/"}{a[$1,$2,$3]++}END{for(i in a) print a[i],i}'

從文本文件中提取特定數據 - bash

問題描述

1 個解決方案

解決方案1
2 已采納 2020-06-24 10:30:04

從文本文件中提取特定數據 - bash

問題描述

1 個解決方案

解決方案1 2 已采納 2020-06-24 10:30:04

解決方案1
2 已采納 2020-06-24 10:30:04