简体   繁体   中英

Add to row value in CSV based on two different columns in other CSV

Thank you in advance for your help.

I am trying to take one csv file that includes a list of relative frequencies with specific IDs on specific dates and consolidate all of the data by date so that the second CSV file has a list of unique dates and the consolidated relative frequencies for each ID on that date.

The first CSV file (which has duplicate dates) looks like this:

ID,Date,Relfreq
CR,10061,9.01E-07
CR,10061,9.01E-07
TPN,10062,5.42782E-06
TPN,10062,8.14173E-06
TPN,10062,5.42782E-06
TPN,10062,8.14173E-06
TPN,10062,0.000179118
CR,10062,7.02E-07
CR,10062,1.05307E-06
CR,10062,7.02E-07
CR,10062,1.75512E-06
CR,10062,1.05307E-06
TPN,10070,1.99831E-05
TPN,10070,9.99156E-06

The second CSV file (which just has unique dates) looks like this:

Date,TPN,CR
10050,0,0
10051,0,0
10052,0,0
10060,0,0
10061,0,0
10062,0,0
10070,0,0
10071,0,0
10072,0,0

I need the script to look at the first file and add all of the relative frequencies for each ID for each date. So, for example, it should add all of the values under Relfreq that have the ID "CR" and the date "10062" and separately it should add all of the values under Relfreq that have the ID "TPN" and the date "10062." Then I want it to look at the second file and find "10062" and add the sum of the TPN Relfreqs to the 2nd column (labeled "TPN") and the sum of the CR Relfrews to the 3rd column (labeled "CR").

I've written the following script but I'm not sure it's actually doing what I want and gets me the error printed below it:

import unicodecsv
import csv
import io
import math 
from decimal import *

alist, blist = [], []

with open("wholetopic.csv", "rU") as fileA:
    reader = csv.reader(fileA, delimiter=',')
    for row in reader:
        alist.append(row)
with open("date.csv", "rU") as fileB:
    reader = csv.reader(fileB, delimiter=',')
    for row in reader:
        blist.append(row)

TPNlist, CRlist = [],[]

c = csv.writer(open("finaltopic.csv", "a"))
for brow in blist:
    dateB = brow[0]
    for arow in alist:
        dateA = arow[1]
        ID = arow[0]
        RF = arow[2]
        if dateB == dateA:
            if ID == "TPN":
                TPNlist.append(RF)
            else:
                if ID == "CR":
                    CRlist.append(RF)
                    continue
        TPNsum = sum(TPNlist)
        CRsum = sum(CRlist)
        values = dateB,TPNsum,CRsum
        c.writerow(values)                                   

print "Done!"

Here is the error:

  File "consolidatedates.py", line 34, in <module>
    TPNsum = sum(TPNlist)
TypeError: unsupported operand type(s) for +: 'int' and 'str'

TPNlist.append( float (RF))
CRlist.append( float (RF))

The error states that you are trying to add an int to a string , which isn't supported in Python.

You can try casting the value of RF to an int as you append, something like:

TPNlist.append(int(RF))

If RF is already an int , no worries, if it's "10" or something, that fixes your problem. However, if RF contains alphabetical or non-alphanumeric values (like '' , for example, if that column of the row had no value in the source file), you'll get an error like this:

ValueError: invalid literal for int() with base 10: ''

In which case you need to make sure your source files are correctly formatted, or that you are referencing the correct row.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM