简体   繁体   English

转换慢速熊猫迭代到应用

[英]Transform slow pandas iterrows into apply

I have the following dataframe: 我有以下数据框:

   VALUE      COUNT  RECL_2007  RECL_2008  RECL_2009  A_A  A_B  A_C  B_A  B_B  \
0    189   149.5872        503        503        500    0    0    0    0    0   
1    209  1939.6160        503        503        503    0    0    0    0    0   
2    499   617.4784        503        500        503    0    0    0    0    0   
3    585    73.0688        503        503        503    0    0    0    0    0   
4    611   133.9072        503        500        503    0    0    0    0    0   
5    645   278.7904        503        503        503    0    0    0    0    0   
6    659   138.2976        500        503        503    0    0    0    0    0   
7    719   769.5744        503        503        502    0    0    0    0    0   

   B_C  C_A  C_B  C_C  
0    0    0    0    0  
1    0    0    0    0  
2    0    0    0    0  
3    0    0    0    0  
4    0    0    0    0  
5    0    0    0    0  
6    0    0    0    0  
7    0    0    0    0 

Also, the values in columns:RECL_2007, RECL_2008 and RECL_2009 correspond to the variables A, B and C as follows: 同样,列:RECL_2007,RECL_2008和RECL_2009中的值对应于变量A,B和C,如下所示:

 A = 500, B=502, C=503.

I want to fill the columns A_A...C_C using the values in the COUNT column such that the RECL_2007 value gives the first part of the column name and RECL_2009 gives the second part of the column name. 我想使用COUNT列中的值填充A_A ... C_C列,以使RECL_2007值给出列名的第一部分,而RECL_2009给出列名的第二部分。

Ie, if RECL_2007 == 503 and RECL_2009 == 500, then the column is C_A and its value should be updated to whatever is in the COUNT column of that row. 即,如果RECL_2007 == 503和RECL_2009 == 500,则该列为C_A,并且其值应更新为该行的COUNT列中的任何值。

Currently I am iterating through the pandas dataframe using iterrows: 目前,我正在使用iterrows遍历pandas数据框:

for index, row in df.iterrows():    
   init = OPP_LU[row[name_init]] # Get first part of column name
   finl = OPP_LU[row[name_finl]] # Get second part of column name       
   col_name = init+'_'+finl
   df.loc[index,col_name] = row['COUNT']

This is slow, but I am not sure how to translate it into something using apply. 这很慢,但是我不确定如何使用Apply将其转换为某种东西。 Any hints? 有什么提示吗?

There are two ways to do that. 有两种方法可以做到这一点。

  • You can use apply function, but you need to do extra work here(just to simplify the work). 您可以使用apply函数,但是您需要在此处做一些额外的工作(只是为了简化工作)。

A dictionary to help you build naming 字典,以帮助您建立命名

d={'500':'A','502':'B','503':'C'}

Function for naming 命名功能

name= lambda x: "{0}_{1}".format(d[str(int(x['RECL_2007']))],d[str(int(x['RECL_2009']))])

Then, go through the items and copy count item where the name is similar. 然后,浏览名称相似的项目并复制计数项目。

   df["C_A"] = df.apply(lambda x: x['COUNT'] if name(x)=='C_A' else 0, axis=1)

The other solution, which is simpler is to filter the data you have, then copy count item 另一个更简单的解决方案是过滤您拥有的数据,然后复制计数项目

df.loc[(df['RECL_2007']==503) & (df['RECL_2009']==503), 'C_C']= df['COUNT']

The code would look like, this is just a quick example, you need to work on the other scenarios. 代码看起来像,这只是一个简单的示例,您需要在其他场景下工作。

data= """VALUE,COUNT,RECL_2007,RECL_2008,RECL_2009\n189,149.5872,503,503,500\n209,939.6160,503,503,503\n499,617.4784,503,500,503\n585,73.0688,503,503,503\n611,133.9072,503,500,503\n645,278.7904,503,503,503\n659,138.2976,500,503,503\n719,769.5744,503,503,502"""

import pandas as pd

from io import StringIO

df= pd.read_csv(StringIO(data.decode('UTF-8')),sep=',' )

#First approach:    
d={'500':'A','502':'B','503':'C'}
name= lambda x: "{0}_{1}".format(d[str(int(x['RECL_2007']))],d[str(int(x['RECL_2009']))])
df['C_C']=[0]*len(df.VALUE)

df["C_A"] = df.apply(lambda x: x['COUNT'] if name(x)=='C_A' else 0, axis=1)

#Second approach:     
df.loc[(df['RECL_2007']==503) & (df['RECL_2009']==503), 'C_C']= df['COUNT']

print df

Output: 输出:

   VALUE     COUNT  RECL_2007  RECL_2008  RECL_2009       C_C       C_A
0    189  149.5872        503        503        500    0.0000  149.5872
1    209  939.6160        503        503        503  939.6160    0.0000
2    499  617.4784        503        500        503  617.4784    0.0000
3    585   73.0688        503        503        503   73.0688    0.0000
4    611  133.9072        503        500        503  133.9072    0.0000
5    645  278.7904        503        503        503  278.7904    0.0000
6    659  138.2976        500        503        503    0.0000    0.0000
7    719  769.5744        503        503        502    0.0000    0.0000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM