[英]More efficient way to write this for loop?
import pandas as pd
sim = [['Matthew Stafford', 15, 13, 12], ['Dalvin Cook', 18, 16, 17], ['Daniel Jones', 17, 17, 15], ['Joe Mixon', 16, 15, 15]]
col = ['Player', 1 , 2, 3]
NFL_Sim = pd.DataFrame(sim, columns=col)
list = [['Matthew Stafford', 'Dalvin Cook'], ['Daniel Jones', 'Joe Mixon']]
col = ['QB', 'RB']
output_lines = pd.DataFrame(list, columns=col)
for x in range(1, 4):
output_lines[x] = output_lines.QB.map(NFL_Sim.set_index('Player')[x].to_dict()) + output_lines.RB.map(NFL_Sim.set_index('Player')[x].to_dict())
print(output_lines)
QB RB 1 2 3
0 Matthew Stafford Dalvin Cook 33 29 29
1 Daniel Jones Joe Mixon 33 32 30
所需的 output 是正确的,但是当我按比例放大时,我在 NFL_Sim dataframe 中有数千列,这使得映射非常慢。 有没有更有效的方法来编写这个 for 循环? 还是先将 output_lines 转换为列表? 我真的不确定什么是最好的。
首先,我建议您在创建NFL_Sim
时设置一次索引。 这样你就不必在循环中执行两次。
其次,如果您有四分卫列表和跑卫列表,我建议您创建两个矩阵:一个用于四分卫,一个用于跑卫。 然后你可以把这两个加在一起。
import pandas as pd
sim = [['Matthew Stafford', 15, 13, 12], ['Dalvin Cook', 18, 16, 17], ['Daniel Jones', 17, 17, 15], ['Joe Mixon', 16, 15, 15]]
col = ['Player', 1 , 2, 3]
NFL_Sim = pd.DataFrame(sim, columns=col).set_index('Player')
qbs = ['Matthew Stafford', 'Daniel Jones']
rbs = ['Dalvin Cook', 'Joe Mixon']
qb_scores = NFL_Sim.loc[qbs, :]
rb_scores = NFL_Sim.loc[rbs, :]
# We need to reset the index because otherwise the addition
# of qb_scores and rb_scores will not be compatible; they have
# different indexes
output = qb_scores.reset_index(drop=True) + rb_scores.reset_index(drop=True)
output = output.assign(QB=qbs, RB=rbs)
一种更加动态的方式melt
:
>>> x = output_lines.melt(value_name='Player', ignore_index=False).merge(NFL_Sim, on='Player')
>>> output_lines = output_lines.join(x.loc[[*x.index[::2], *x.index[1::2]]].groupby(x.index // 2).sum())
>>> output_lines
RB QB 1 2 3
0 Matthew Stafford Dalvin Cook 33 29 29
1 Daniel Jones Joe Mixon 33 32 30
>>>
使用系列创建映射:(QB 和 RB 的配对在 output_lines 中处理,我们想使用索引位置连接到 NFL_Sim)
mapping = output_lines.T.stack()
mapping = pd.Series(mapping.index.droplevel(0), mapping)
获取每个 position 的总和:
mapping = (NFL_Sim.assign(positions = lambda df: df.Player.map(mapping))
# we do not need the Player column anymore,
# since we have our mapping
.select_dtypes('number')
.groupby('positions')
.sum()
)
将映射重新连接回 output_lines
output_lines.join(mapping)
QB RB 1 2 3
0 Matthew Stafford Dalvin Cook 33 29 29
1 Daniel Jones Joe Mixon 33 32 30
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.