简体   繁体   English

熊猫将浮点数附加到for循环中的列

[英]Pandas append float to column in for loop

I have pandas DataFrame where I want to do some calculations using elements in the df, and then append the calculated number into a separate column in the same df. 我有pandas DataFrame,我想在其中使用df中的元素进行一些计算,然后将计算出的数字附加到同一df中的单独列中。

Here is my code as of now. 到目前为止,这是我的代码。

def percentfunction(df):
    for i in range(100): 
        if df['month_number'][i] == 10:
            df = df['percent_october'][i].add([df['cellsum'][i]/octobersum])
        elif df['month_number'][i] == 11:
            df = df['percent_november'][i].add([df['cellsum'][i]/novembersum])
        elif df['month_number'][i] == 12:
            df = df['percent_december'][i].add([df['cellsum'][i]/decembersum])

AttributeError: 'numpy.float64' object has no attribute 'add'

I have tried various implementations of this code, but I always get an error message. 我已经尝试了此代码的各种实现,但始终收到错误消息。 Either it gets to the last element and then writes the columns containing only the last number calculated, or it adds in rows where it is not supposed to add anything. 它要么到达最后一个元素,然后写入仅包含所计算的最后一个数字的列,要么将其添加到不应添加任何内容的行中。

Critiques welcome! 欢迎批评!

EDIT: Tried to edit the code. 编辑:试图编辑代码。

    def percentfunction(df):
        for i in range(100): 
            if df['month_number'][i] == 10:
                df['percent_october'][i] = df['cellsum'][i]/octobersum
            elif df['month_number'][i] == 11:
                df['percent_november'][i] = df['cellsum'][i]/novembersum
            elif df['month_number'][i] == 12:
                df['percent_december'][i] = df['cellsum'][i]/decembersum

I get this to run at least, but this fills in values in rows where it shouldn't as well... 我至少要运行它,但这会在行中的值不应该填充的地方...

EDIT2: Here is a sample of my dataframe EDIT2:这是我的数据框的示例

>>> df.head()
      Index          month_number        month_text  \
0     Name1                    10           October     
1     Name1                    11           November    
2     Name1                    12           December    
3     Name2                    10           October     
4     Name2                    11           November    

  2000 Unnamed: 4 2001 Unnamed: 6     2002 Unnamed: 8 2003    ...     \
0  NaN        NaN  NaN        NaN      NaN        NaN  NaN    ...      
1  NaN        NaN  NaN        NaN      NaN        NaN  NaN    ...      
2  NaN        NaN  NaN        NaN      NaN        NaN  NaN    ...      
3  NaN        NaN  NaN        NaN  2898.68       3120  NaN    ...      
4  NaN        NaN  NaN        NaN      NaN        NaN  NaN    ...      

  Unnamed: 28 2013 Unnamed: 30  2014 Unnamed: 32 2015 Unnamed: 34 2016  \
0         NaN  NaN         NaN   NaN         NaN  NaN         NaN  NaN   
1         NaN  NaN         NaN   NaN         NaN  NaN         NaN  NaN   
2         NaN  NaN         NaN   NaN         NaN  NaN         NaN  NaN   
3         NaN  NaN         NaN   NaN         NaN  NaN         NaN  NaN   
4         NaN  NaN         NaN  1.26         127  NaN         NaN  NaN   

  Unnamed: 36   cellsum  
0         NaN      3899  
1         NaN      7922  
2         NaN      2181  
3         NaN      3121  
4         NaN       127

This is my DataFrame, the 'cellsum' is the sum of all the "Unnamed" cells across that row. 这是我的DataFrame,“ cellsum”是该行中所有“未命名”单元的总和。 I have calculated the total month sum by summing all the, for example october cellsums (octobersum) in the DataFrame. 我已通过将DataFrame中所有十月的单元格总和(octobersum)相加得出了月份的总和。 I then want to add a new column, with what percentage that cellsum is. 然后,我想添加一个新列,该列的cellum百分比是多少。 I hope you can understand. 我希望你能够明白。

You should avoid loops with pandas. 您应该避免与熊猫循环。 You need something like this which you can then manipulate into any format you want: 您需要类似这样的内容,然后可以将其处理为所需的任何格式:

df["percent_month"] = df.groupby("month_number").apply(lambda x: x/x.sum())

a simple fix would be using df.ix[] 一个简单的解决方法是使用df.ix[]

df.ix[i,'percent_october'] = df.ix[i,'cellsum']/octobersum

if you show us what df looks like we might be able to give you a smarter solution than looping over a dataframe which is not recommended 如果您向我们展示df的外观,我们可能会为您提供比遍历数据框更智能的解决方案(不推荐)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM