简体   繁体   English

跳过 CSV 文件中特定列的 Python 脚本

[英]Python script to skip specific column in CSV files

I have a Python code which filters the data according to specific column and creates multiple CSV files.我有一个 Python 代码,它根据特定列过滤数据并创建多个 CSV 文件。

Here is my main csv file:这是我的主要 csv 文件:

Name,    City,      Email
john     cty_1      a@g.com
jack     cty_1      b@g.com
...
Ross     cty_2      c@g.com
Rachel   cty_2      d@g.com
...

My python logic currently creates separate csv for separate city.我的 python 逻辑目前为单独的城市创建单独的 csv。 Existing python logic is:现有的python逻辑是:

from itertools import groupby
import csv

with open('filtered_final.csv') as csv_file:
    reader = csv.reader(csv_file)
    next(reader) #skip header
    
    #Group by column (city)
    lst = sorted(reader, key=lambda x : x[1])
    groups = groupby(lst, key=lambda x : x[1])

    #Write file for each city
    for k,g in groups:
        filename = k[21:] + '.csv'
        with open(filename, 'w', newline='') as fout:
            csv_output = csv.writer(fout)

            csv_output.writerow(["Name","City","Email"])  #header
            for line in g:
                csv_output.writerow(line)

Now, I want to remove the "City" Column on each new CSV files.现在,我想删除每个新 CSV 文件上的“城市”列。

然后尝试导入:

df = pd.read_csv('filtered_final.csv', usecols=['Name','Email'])

If you data is small enough to put on ram, you can just read the whole thing in and do a groupby:如果您的数据小到可以放在 ram 上,您可以读取整个内容并进行分组:

import pandas as pd

df = pd.read_csv('filtered_final.csv')

for city, data in df[['Name','Email']].groupby(df['City']):
    data.to_csv(f'{city}_data.csv', index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM