[英]Python script to skip specific column in CSV files
I have a Python code which filters the data according to specific column and creates multiple CSV files.我有一个 Python 代码,它根据特定列过滤数据并创建多个 CSV 文件。
Here is my main csv file:这是我的主要 csv 文件:
Name, City, Email
john cty_1 a@g.com
jack cty_1 b@g.com
...
Ross cty_2 c@g.com
Rachel cty_2 d@g.com
...
My python logic currently creates separate csv for separate city.我的 python 逻辑目前为单独的城市创建单独的 csv。 Existing python logic is:
现有的python逻辑是:
from itertools import groupby
import csv
with open('filtered_final.csv') as csv_file:
reader = csv.reader(csv_file)
next(reader) #skip header
#Group by column (city)
lst = sorted(reader, key=lambda x : x[1])
groups = groupby(lst, key=lambda x : x[1])
#Write file for each city
for k,g in groups:
filename = k[21:] + '.csv'
with open(filename, 'w', newline='') as fout:
csv_output = csv.writer(fout)
csv_output.writerow(["Name","City","Email"]) #header
for line in g:
csv_output.writerow(line)
Now, I want to remove the "City" Column on each new CSV files.现在,我想删除每个新 CSV 文件上的“城市”列。
然后尝试导入:
df = pd.read_csv('filtered_final.csv', usecols=['Name','Email'])
If you data is small enough to put on ram, you can just read the whole thing in and do a groupby:如果您的数据小到可以放在 ram 上,您可以读取整个内容并进行分组:
import pandas as pd
df = pd.read_csv('filtered_final.csv')
for city, data in df[['Name','Email']].groupby(df['City']):
data.to_csv(f'{city}_data.csv', index=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.