使用嵌套的 defaultdict 重新分配熊猫系列值

Question

I'm working on an NFL dataset and want to do the following mapping for every play in the df:我正在处理 NFL 数据集，并希望为 df 中的每场比赛进行以下映射：

I'm trying to populate a column ( DistToRusher ) with the distance of each player to the rusher of that play.我试图用每个玩家到该游戏的冲击者的距离填充一列（ DistToRusher ）。
The DistToRusher column is currently populated with player ids. DistToRusher列当前填充有玩家 ID。
I'm trying to map these player ids to the ones in the inner dictionary keys and replace them with the inner dictionary values.我试图将这些玩家 ID 映射到内部字典键中的那些，并用内部字典值替换它们。
I have a defaultdict-of-dictionaries dist_dict that looks like this:我有一个 defaultdict-of-dictionaries dist_dict看起来像这样：

    dist_dict = {play_id1: {player_id1: distance, player_id2: distance ...}, 
                 play_id2: {player_id1: distance, player_id2: distance ...}...}

Here is my code:这是我的代码：

def populate_DistToRusher_column(df):
    for play_id, players_dict in dist_dict.items():
        df[df.PlayId == play_id].replace({'DistToRusher': players_dict}, inplace=True)
    return df

This code runs, is slow (20-30s), and doesn't change DistToRusher column;此代码运行缓慢（20-30 秒），并且不会更改DistToRusher列； when I inspect the df, DistToRusher still contains the player id numbers and not the distances.当我检查 df 时， DistToRusher仍然包含玩家 ID 号而不是距离。

Here is a toy version of the actual data:这是实际数据的玩具版本：

from collections import defaultdict 
import pandas as pd
df = pd.DataFrame.from_dict({'PlayId': {
  0: 20170907000118, 1: 20170907000118, 2: 20170907000118,
  22: 20170907000139, 23: 20170907000139, 24: 20170907000139},
 'NflId': {0: 496723, 1: 2495116, 2: 2495493,
  22: 496723, 23: 2495116, 24: 2495493},
 'NflIdRusher': {0: 2543773, 1: 2543773, 2: 2543773,
  22: 2543773, 23: 2543773, 24: 2543773},
 'DistToRusher': {0: 496723, 1: 2495116, 2: 2495493,
  22: 496723, 23: 2495116, 24: 2495493}})

dist_dict = {20170907000118: defaultdict(float,
             {496723: 6.480871854928166,
              2495116: 4.593310353111358,
              2495493: 5.44898155621764}),
 20170907000139: defaultdict(float,
             {496723: 8.583355987025117,
              2495116: 5.821151088917024,
              2495493: 6.658686056573021})}

Answer 1

I think this is right, IIUC:我认为这是对的，IIUC：

temp = pd.DataFrame(dist_dict)
df['DistToRusher2'] = df.apply(lambda x: temp[x.PlayId][x.NflId], axis=1)

or

df['DistToRusher2'] = df.apply(lambda x: dist_dict[x.PlayId][x.NflId], axis=1)

output:输出：

            PlayId    NflId  NflIdRusher  DistToRusher  DistToRusher2
0   20170907000118   496723      2543773        496723       6.480872
1   20170907000118  2495116      2543773       2495116       4.593310
2   20170907000118  2495493      2543773       2495493       5.448982
22  20170907000139   496723      2543773        496723       8.583356
23  20170907000139  2495116      2543773       2495116       5.821151
24  20170907000139  2495493      2543773       2495493       6.658686

Answer 2

Thanks @oppressionslayer!谢谢@oppressionslayer！ This worked like a charm:这就像一个魅力：

df['DistToRusher2'] = df.apply(lambda x: dist_dict[x.PlayId][x.NflId], axis=1)

使用嵌套的 defaultdict 重新分配熊猫系列值

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-12-08 07:42:36

解决方案2
1 2019-12-08 09:35:11

使用嵌套的 defaultdict 重新分配熊猫系列值

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-12-08 07:42:36

解决方案2 1 2019-12-08 09:35:11

解决方案1
2 已采纳 2019-12-08 07:42:36

解决方案2
1 2019-12-08 09:35:11