简体   繁体   English

Python-合并CSV文件中的列

[英]Python - Combining Columns in a CSV file

I'm trying to create code that will take data form certain columns in a CSV file and combine them into a new CSV file. 我正在尝试创建代码,这些代码将采用CSV文件中某些列的数据并将其合并为新的CSV文件。 I was directed to use Pandas but I'm not sure if I'm even on the right track. 我被指示使用熊猫,但我不确定自己是否走在正确的轨道上。 I'm fairly new to Python so prepare yourselves for potentially awful code. 我对Python还是很陌生,所以请准备好应对潜在的糟糕代码。

I want to use data.csv: 我想使用data.csv:

Customer_ID,Date,Time,OtherColumns,A,B,C,Cost
1003,January,2:00,Stuff,1,5,2,519
1003,January,2:00,Stuff,1,3,2,530
1003,January,2:00,Stuff,1,3,2,530
1004,Feb,2:00,Stuff,1,1,0,699

and create a new CSV that looks like this: 并创建一个新的CSV,如下所示:

Customer_ID,ABC
1003,152
1003,132
1003,132
1004,110

What I have so far is: 到目前为止,我有:

import csv
import pandas as pd

df = pd.read_csv('test.csv', delimiter = ',')
custID = df.customer_ID
choiceA = df.A
choiceB = df.B
choiceC = df.C

ofile  = open('answer.csv', "wb")
writer = csv.writer(ofile, delimiter = ',')
writer.writerow(custID + choiceA + choiceB + choiceC)

Unfortunately all that does is add each row together, then create a CSV of each row summed together as one row. 不幸的是,所做的全部工作是将每一行加在一起,然后为每一行创建一个CSV,并将它们总和为一行。 My true end goal would be to find the most occurring values in columns AC and combine each customer into the same row, using the most occurring values. 我真正的最终目标是在AC列中找到最常出现的值,并使用最常出现的值将每个客户合并到同一行。 I'm awful at explaining. 我很难解释。 I'd want something that takes data.csv and makes this: 我想要带data.csv并使其如下的东西:

Customer_ID,ABC
1003,132
1004,110

You can sum the columns your interested in (if their type is string): 您可以对感兴趣的列求和(如果它们的类型是字符串):

In [11]: df = pd.read_csv('data.csv', index_col='Customer_ID')

In [12]: df
Out[12]:
                Date  Time OtherColumns  A  B  C  Cost
Customer_ID
1003         January  2:00        Stuff  1  5  2   519
1003         January  2:00        Stuff  1  3  2   530
1003         January  2:00        Stuff  1  3  2   530
1004             Feb  2:00        Stuff  1  1  0   699

In [13]: res = df[list('ABC')].astype(str).sum(1)  # cols = list('ABC')

In [14]: res
Out[14]:
Customer_ID
1003           152
1003           132
1003           132
1004           110
dtype: float64

To get the csv, you can first use to_frame (to add the desired column name): 要获取csv,您可以首先使用to_frame (添加所需的列名):

In [15]: res.to_frame(name='ABC')  # ''.join(cols)
Out[15]:
             ABC
Customer_ID
1003         152
1003         132
1003         132
1004         110

In [16]: res.to_frame(name='ABC').to_csv('new.csv')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM