![](/img/trans.png)
[英]Extracting a specific file name from multiple .tar.gz files to build one .csv file
[英]Extracting specific folders in multiple tar.gz files recursively
我正在使用開放的 Synthetic 患者和人口健康數據Synthea 。
該數據集包含一個 21gb 的 tar.gz,它提取到一組 tar.gz 文件中,這些文件以多種數據格式表示數據。
提取的源文件夾結構如下所示:
|-- output_11_20170528T113605.tar.gz
|-- output_1_20170524T232103.tar.gz
|-- output_12_20170528T195303.tar.gz
|-- output_2_20170525T073836.tar.gz
|-- output_3_20170525T161555.tar.gz
|-- output_4_20170526T004637.tar.gz
|-- output_5_20170526T091439.tar.gz
|-- output_6_20170526T173337.tar.gz
|-- output_7_20170527T015508.tar.gz
|-- output_8_20170527T102552.tar.gz
|-- output_9_20170527T185007.tar.gz
我嘗試使用以下命令僅提取 CSV 文件,該命令適用於單個文件:
tar -zxvf output_1_20170525T073836.tar.gz "output_1*csv*" -C ../synthea_output_folder
最好構建一個 shell 腳本,該腳本可以遍歷這些文件並從每個 tar.gz 文件中提取 CSV 文件夾,以便它們出現在 synthea_output_folder 中,如下所示:
|-- output_11/csv
|-- output_1/csv
|-- output_12/csv
|-- output_2/csv
|-- output_3/csv
|-- output_4/csv
|-- output_5/csv
|-- output_6/csv
|-- output_7/csv
|-- output_8/csv
|-- output_9/csv
我找到了一個 shell 腳本以遞歸方式解壓縮,但我不知道如何從每個文件中僅過濾掉 CSV 文件夾:
for f in *.tar.gz; do tar -xzvf "$f"; done
可能的解決方案
在修改了上述 shell 代碼后,我設法通過添加csv通配符命令僅提取 csv 文件夾:
for f in *.tar.gz; do tar -xzvf "$f" "*csv*" -C ../synthea_output; done
output 現在看起來像這樣:
|-- output_1
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_10
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_11
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_12
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_2
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_3
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_4
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_5
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_6
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_7
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_8
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
`-- output_9
`-- csv
|-- allergies.csv
|-- careplans.csv
|-- conditions.csv
|-- encounters.csv
|-- immunizations.csv
|-- medications.csv
|-- observations.csv
|-- patients.csv
`-- procedures.csv
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.