Sum list of values within a pandas df

Question

I'm sure this could be somewhere in SO but I can't seem to find it. I am trying to sum values expressed within lists from a pandas df . Example

I can achieve this using this following from an array :

array = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

array = [sum(a) for a in zip(*array)]

But when the values are housed within a df I can't get it working. Here is my attempt:

d = ({
    'Val' : [[1,2,3],[4,5,6],[7,8,9]],   
     })

df = pd.DataFrame(data = d)

df = [sum(a) for a in zip(df['Val'])]

print(df)

df = [sum(a) for a in zip(df['Val'])]

TypeError: unsupported operand type(s) for +: 'int' and 'list'

Answer 1

What you are missing is to send this to a list in the list comprension do the following:

d = ({
'Val' : [[1,2,3],[4,5,6],[7,8,9]],   
 })
df = pd.DataFrame(data = d)

df = [sum(a) for a in df['Val'].tolist()]

Doing list comprehensions with a data frame is really slow.

Answer 2

You are missing the * star operator to unpack:

df = [sum(a) for a in zip(*df['Val'])]

print(df)
[12, 15, 18]

Now it passes lists, thats why you get this error:

for a in zip(df['Val']):
    print(a)

print('\n')

for a in zip(*df['Val']):  # <--- notice the *
    print(a)

# Output

([1, 2, 3],)
([4, 5, 6],)
([7, 8, 9],)


(1, 4, 7)
(2, 5, 8)
(3, 6, 9)

Answer 3

Convert to np.array first.

df.Val = df.Val.apply(lambda x: np.array(x))

Then you can use .sum

df.Val.sum()

array([12, 15, 18])

Sum list of values within a pandas df

Question

3 answers

solution1
1 2019-04-16 00:17:38

solution2
1 ACCPTED 2019-04-16 00:19:04

solution3
1 2019-04-16 00:31:43

Sum list of values within a pandas df

Question

3 answers

solution1 1 2019-04-16 00:17:38

solution2 1 ACCPTED 2019-04-16 00:19:04

solution3 1 2019-04-16 00:31:43

solution1
1 2019-04-16 00:17:38

solution2
1 ACCPTED 2019-04-16 00:19:04

solution3
1 2019-04-16 00:31:43