简体   繁体   中英

Python - Combining Columns in a CSV file

I'm trying to create code that will take data form certain columns in a CSV file and combine them into a new CSV file. I was directed to use Pandas but I'm not sure if I'm even on the right track. I'm fairly new to Python so prepare yourselves for potentially awful code.

I want to use data.csv:

Customer_ID,Date,Time,OtherColumns,A,B,C,Cost
1003,January,2:00,Stuff,1,5,2,519
1003,January,2:00,Stuff,1,3,2,530
1003,January,2:00,Stuff,1,3,2,530
1004,Feb,2:00,Stuff,1,1,0,699

and create a new CSV that looks like this:

Customer_ID,ABC
1003,152
1003,132
1003,132
1004,110

What I have so far is:

import csv
import pandas as pd

df = pd.read_csv('test.csv', delimiter = ',')
custID = df.customer_ID
choiceA = df.A
choiceB = df.B
choiceC = df.C

ofile  = open('answer.csv', "wb")
writer = csv.writer(ofile, delimiter = ',')
writer.writerow(custID + choiceA + choiceB + choiceC)

Unfortunately all that does is add each row together, then create a CSV of each row summed together as one row. My true end goal would be to find the most occurring values in columns AC and combine each customer into the same row, using the most occurring values. I'm awful at explaining. I'd want something that takes data.csv and makes this:

Customer_ID,ABC
1003,132
1004,110

You can sum the columns your interested in (if their type is string):

In [11]: df = pd.read_csv('data.csv', index_col='Customer_ID')

In [12]: df
Out[12]:
                Date  Time OtherColumns  A  B  C  Cost
Customer_ID
1003         January  2:00        Stuff  1  5  2   519
1003         January  2:00        Stuff  1  3  2   530
1003         January  2:00        Stuff  1  3  2   530
1004             Feb  2:00        Stuff  1  1  0   699

In [13]: res = df[list('ABC')].astype(str).sum(1)  # cols = list('ABC')

In [14]: res
Out[14]:
Customer_ID
1003           152
1003           132
1003           132
1004           110
dtype: float64

To get the csv, you can first use to_frame (to add the desired column name):

In [15]: res.to_frame(name='ABC')  # ''.join(cols)
Out[15]:
             ABC
Customer_ID
1003         152
1003         132
1003         132
1004         110

In [16]: res.to_frame(name='ABC').to_csv('new.csv')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM