简体   繁体   中英

Delete lines in CSV file with a column condition in bash

I have a big CSV file (5Go). The header is:

run number,export,downerQ,coefUpQuality,chooseMode,demandF,nbPLots,standarDevPop,nbCitys,whatWord,priceMaxWineF,marketColor,[step],giniIndexReserve,giniIndexPatch,meanQualityTotal,meanQualityMountain,meanQualityPlain,DiffExtCentral,nbcentralPlots,meanPatchByNetwork,sum_q_viti_moutain,sum_q_viti_plaine
"3","false","0.5","0.01","false","7000","10","2","10","0","70","false","0","0","0.07083333333333335","0","0","0","0","0","0","48","0"
"4","false","0.5","0.01","false","7000","10","2","10","0","70","false","0","0","0.04285714285714286","0","0","0","0","0","0","42","0"
"2","false","0.5","0.01","false","7000","10","2","10","0","70","false","0","0","0.05348837209302328","0","0","0","0","0","0","43","0"

I would like keep only rows that contain "500" in the field [step] (the thirteenth field).

  • I have tried to import this CSV in sqlite ... but deleting crash ...
  • R also crash (even with fread from data.table)

Does someone have a solution with tools like sed , awk or any other command?

awk seems the way to go:

awk -F, 'NR == 1 || $13 == "\"500\""' filename

Where NR == 1 is to preserve the first line (the header), and after that it's only lines of which the 13th field is "500" .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM