简体   繁体   中英

Unable to format huge csv file and write to a file through python

I have large size CSV file of 300 MB. I need to read the file and delete the row if there is single column; append the row if there is a word "cloud" in fourth column. so i wrote a script which reads first and write data that are valid to another csv.

First i wrote a generator to read the data as file size are really huge

def gen_csv(file_name):
  with open(file_name, 'rb') as csvfile:
   csvfile.seek(0)
   datareader =  csv.reader(csvfile, delimiter=',')
   for row in datareader:
     yield row

And calls the writer function

def format_csv(r_list):
  gzip_list = []
  for report in r_list:
    outputfile = report[:-4]+"-output.csv"
    with open(outputfile, 'wb') as firstcsv:
      firstcsv.seek(0)
      firstwriter = csv.writer(firstcsv, delimiter=',')
      for row in gen_csv(report):
        if len(row) == 1:
          continue
        elif row[3] == "Label":
          firstwriter.writerow(row)
        elif row[3].find('Cloud') > 0:
          firstwriter.writerow(row)
        else: pass

    firstcsv.close()

But the new CSV file has only one line the first line of the first CSV.

Thanks in advance

EDIT ::

I found the mistake that i have done it was logical mistake to pick the rite row.

You can use Pandas :

Code Example:

1-.
import pandas as pd
df = pd.read_csv("to_remove.csv")
keep_cols = ["Name", "Address"]
new_df = df[keep_cols]
new_df.to_csv("removed.csv", index=False) 


2.- 
df = pd.read_csv("your.csv", index_col=[0,1], skipinitialspace=True)
df.drop('column_name', axis=1, inplace=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM