简体   繁体   中英

Sum list of values within a pandas df

I'm sure this could be somewhere in SO but I can't seem to find it. I am trying to sum values expressed within lists from a pandas df . Example

I can achieve this using this following from an array :

array = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

array = [sum(a) for a in zip(*array)]

But when the values are housed within a df I can't get it working. Here is my attempt:

d = ({
    'Val' : [[1,2,3],[4,5,6],[7,8,9]],   
     })

df = pd.DataFrame(data = d)

df = [sum(a) for a in zip(df['Val'])]

print(df)

df = [sum(a) for a in zip(df['Val'])]

TypeError: unsupported operand type(s) for +: 'int' and 'list'

What you are missing is to send this to a list in the list comprension do the following:

d = ({
'Val' : [[1,2,3],[4,5,6],[7,8,9]],   
 })
df = pd.DataFrame(data = d)

df = [sum(a) for a in df['Val'].tolist()]

Doing list comprehensions with a data frame is really slow.

You are missing the * star operator to unpack:

df = [sum(a) for a in zip(*df['Val'])]

print(df)
[12, 15, 18]

Now it passes lists, thats why you get this error:

for a in zip(df['Val']):
    print(a)

print('\n')

for a in zip(*df['Val']):  # <--- notice the *
    print(a)

# Output

([1, 2, 3],)
([4, 5, 6],)
([7, 8, 9],)


(1, 4, 7)
(2, 5, 8)
(3, 6, 9)

Convert to np.array first.

df.Val = df.Val.apply(lambda x: np.array(x))

Then you can use .sum

df.Val.sum()

array([12, 15, 18])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM