I'm sure this could be somewhere in SO
but I can't seem to find it. I am trying to sum
values expressed within lists
from a pandas
df
. Example
I can achieve this using this following from an array
:
array = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
array = [sum(a) for a in zip(*array)]
But when the values are housed within a df
I can't get it working. Here is my attempt:
d = ({
'Val' : [[1,2,3],[4,5,6],[7,8,9]],
})
df = pd.DataFrame(data = d)
df = [sum(a) for a in zip(df['Val'])]
print(df)
df = [sum(a) for a in zip(df['Val'])]
TypeError: unsupported operand type(s) for +: 'int' and 'list'
What you are missing is to send this to a list in the list comprension do the following:
d = ({
'Val' : [[1,2,3],[4,5,6],[7,8,9]],
})
df = pd.DataFrame(data = d)
df = [sum(a) for a in df['Val'].tolist()]
Doing list comprehensions with a data frame is really slow.
You are missing the *
star operator to unpack:
df = [sum(a) for a in zip(*df['Val'])]
print(df)
[12, 15, 18]
Now it passes lists, thats why you get this error:
for a in zip(df['Val']):
print(a)
print('\n')
for a in zip(*df['Val']): # <--- notice the *
print(a)
# Output
([1, 2, 3],)
([4, 5, 6],)
([7, 8, 9],)
(1, 4, 7)
(2, 5, 8)
(3, 6, 9)
Convert to np.array
first.
df.Val = df.Val.apply(lambda x: np.array(x))
Then you can use .sum
df.Val.sum()
array([12, 15, 18])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.