简体   繁体   English

如何找到嵌入在 Pandas 数据框列中的元素列表的平均值

[英]How to find the average of a list of elements imbedded in a Pandas data frame column

I'm the process of cleaning a data frame, and one particular column contains values that are comprised of lists.我正在清理数据框,一个特定的列包含由列表组成的值。 I'm trying to find the average of those lists and update the existing column with an int while preserving the indices.我试图找到这些列表的平均值并用 int 更新现有列,同时保留索引。 I can successfully and efficiently convert those values to a list, but I lose the index values in the process.我可以成功有效地将这些值转换为列表,但在此过程中我丢失了索引值。 The code I've written below is too memory-tasking to execute.我在下面编写的代码太占用内存而无法执行。 Is there a simpler code that would work?有没有更简单的代码可以工作?

data: https://docs.google.com/spreadsheets/d/1Od7AhXn9OwLO-SryT--erqOQl_NNAGNuY4QPSJBbI18/edit?usp=sharing数据: https://docs.google.com/spreadsheets/d/1Od7AhXn9OwLO-SryT--erqOQl_NNAGNuY4QPSJBbI18/edit?usp=sharing

def Average(lst):
    sum1 = 0
    average = 0
    if len(x) == 1:
        for obj in x:
            sum1 = int(obj)

    if len(x)>1:
        for year in x:
            sum1 += int(year)
        average = sum1/len(x)

    return mean(average) 

hello = hello[hello.apply([lambda x: mean(x) for x in hello])]

Here's the loop I used to convert the values into a list:这是我用来将值转换为列表的循环:

df_list1 = []

for x in hello:
        sum1 = 0
        average = 0
        if len(x) == 1:
            for obj in x:
                df_list1.append(int(obj))

        if len(x)>1:
            for year in x:
                sum1 += int(year)
                average = sum1/len(x)
            df_list1.append(int(average))

Use apply and np.mean .使用applynp.mean

import numpy as np

df = pd.DataFrame(data={'listcol': [np.random.randint(1, 10, 5) for _ in range(3)]}, index=['a', 'b', 'c'])

# np.mean will return NaN on empty list
df['listcol'] = df['listcol'].fillna([])

# can use this if all elements in lists are numeric
df['listcol'] = df['listcol'].apply(lambda x: np.mean(x))

# use this instead if list has numbers stored as strings
df['listcol'] = df['listcol'].apply(lambda x: np.mean([int(i) for i in x])) 

Output Output

>>>df
   listcol
a      5.0
b      5.2
c      4.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM