[英]Transform slow pandas iterrows into apply
I have the following dataframe: 我有以下数据框:
VALUE COUNT RECL_2007 RECL_2008 RECL_2009 A_A A_B A_C B_A B_B \
0 189 149.5872 503 503 500 0 0 0 0 0
1 209 1939.6160 503 503 503 0 0 0 0 0
2 499 617.4784 503 500 503 0 0 0 0 0
3 585 73.0688 503 503 503 0 0 0 0 0
4 611 133.9072 503 500 503 0 0 0 0 0
5 645 278.7904 503 503 503 0 0 0 0 0
6 659 138.2976 500 503 503 0 0 0 0 0
7 719 769.5744 503 503 502 0 0 0 0 0
B_C C_A C_B C_C
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
Also, the values in columns:RECL_2007, RECL_2008 and RECL_2009 correspond to the variables A, B and C as follows: 同样,列:RECL_2007,RECL_2008和RECL_2009中的值对应于变量A,B和C,如下所示:
A = 500, B=502, C=503.
I want to fill the columns A_A...C_C using the values in the COUNT column such that the RECL_2007 value gives the first part of the column name and RECL_2009 gives the second part of the column name. 我想使用COUNT列中的值填充A_A ... C_C列,以使RECL_2007值给出列名的第一部分,而RECL_2009给出列名的第二部分。
Ie, if RECL_2007 == 503 and RECL_2009 == 500, then the column is C_A and its value should be updated to whatever is in the COUNT column of that row. 即,如果RECL_2007 == 503和RECL_2009 == 500,则该列为C_A,并且其值应更新为该行的COUNT列中的任何值。
Currently I am iterating through the pandas dataframe using iterrows: 目前,我正在使用iterrows遍历pandas数据框:
for index, row in df.iterrows():
init = OPP_LU[row[name_init]] # Get first part of column name
finl = OPP_LU[row[name_finl]] # Get second part of column name
col_name = init+'_'+finl
df.loc[index,col_name] = row['COUNT']
This is slow, but I am not sure how to translate it into something using apply. 这很慢,但是我不确定如何使用Apply将其转换为某种东西。 Any hints?
有什么提示吗?
There are two ways to do that. 有两种方法可以做到这一点。
apply
function, but you need to do extra work here(just to simplify the work). apply
函数,但是您需要在此处做一些额外的工作(只是为了简化工作)。 A dictionary to help you build naming 字典,以帮助您建立命名
d={'500':'A','502':'B','503':'C'}
Function for naming 命名功能
name= lambda x: "{0}_{1}".format(d[str(int(x['RECL_2007']))],d[str(int(x['RECL_2009']))])
Then, go through the items and copy count item where the name is similar. 然后,浏览名称相似的项目并复制计数项目。
df["C_A"] = df.apply(lambda x: x['COUNT'] if name(x)=='C_A' else 0, axis=1)
The other solution, which is simpler is to filter the data you have, then copy count item 另一个更简单的解决方案是过滤您拥有的数据,然后复制计数项目
df.loc[(df['RECL_2007']==503) & (df['RECL_2009']==503), 'C_C']= df['COUNT']
The code would look like, this is just a quick example, you need to work on the other scenarios. 代码看起来像,这只是一个简单的示例,您需要在其他场景下工作。
data= """VALUE,COUNT,RECL_2007,RECL_2008,RECL_2009\n189,149.5872,503,503,500\n209,939.6160,503,503,503\n499,617.4784,503,500,503\n585,73.0688,503,503,503\n611,133.9072,503,500,503\n645,278.7904,503,503,503\n659,138.2976,500,503,503\n719,769.5744,503,503,502"""
import pandas as pd
from io import StringIO
df= pd.read_csv(StringIO(data.decode('UTF-8')),sep=',' )
#First approach:
d={'500':'A','502':'B','503':'C'}
name= lambda x: "{0}_{1}".format(d[str(int(x['RECL_2007']))],d[str(int(x['RECL_2009']))])
df['C_C']=[0]*len(df.VALUE)
df["C_A"] = df.apply(lambda x: x['COUNT'] if name(x)=='C_A' else 0, axis=1)
#Second approach:
df.loc[(df['RECL_2007']==503) & (df['RECL_2009']==503), 'C_C']= df['COUNT']
print df
Output: 输出:
VALUE COUNT RECL_2007 RECL_2008 RECL_2009 C_C C_A
0 189 149.5872 503 503 500 0.0000 149.5872
1 209 939.6160 503 503 503 939.6160 0.0000
2 499 617.4784 503 500 503 617.4784 0.0000
3 585 73.0688 503 503 503 73.0688 0.0000
4 611 133.9072 503 500 503 133.9072 0.0000
5 645 278.7904 503 503 503 278.7904 0.0000
6 659 138.2976 500 503 503 0.0000 0.0000
7 719 769.5744 503 503 502 0.0000 0.0000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.