简体   繁体   中英

Summing elements in each cell of a column in Panda's dataframe, when arrays are

There's a very similar question here

However, in my case I want to iterate over 'int' objects, hence it's not applicable.

I have pandas df

import pandas as pd

d = {"squares":[[1, 1, 1], [1], [1, 1], [1, 1, 1, 1]]}
df = pd.DataFrame(d)
df["squares"]

and I want to get a df column "squares" such as:

[3]
[1]
[2]
[4]

meaning I want to sum up the values inside each array in a given column 'squares'. I tried

import numpy as np
area_array = np.array(d['squares'].to_list()).sum(axis=1)

But this doesn't work due to the different size of arrays and I would get an Axis Error. Any tip how to do it differently?

You can try apply

df["squares"] = df["squares"].apply(lambda lst: [sum(lst)])
print(df)

  squares
0     [3]
1     [1]
2     [2]
3     [4]

You need to use a loop, for example a list comprehension:

df['squares'] = [[sum(x)] for x in df['squares']]

output (as "square2" for clarity):

        squares squares2
0     [1, 1, 1]      [3]
1           [1]      [1]
2        [1, 1]      [2]
3  [1, 1, 1, 1]      [4]
df["squares2"] = df["squares"].apply(sum)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM