繁体   English   中英

Pandas - 根据现有列值填充新列

[英]Pandas - populate new column based on existing column values

我有以下 dataframe df_shots

              TableIndex  MatchID  GameWeek           Player  ...      ShotPosition    ShotSide      Close             Position
ShotsDetailID                                                 ...                                                              
6                      5    46605         1  Roberto Firmino  ...  very close range         N/A      close  very close rangeN/A
8                      7    46605         1  Roberto Firmino  ...           the box  the centre  not close    the boxthe centre
10                     9    46605         1  Roberto Firmino  ...           the box    the left  not close      the boxthe left
17                    16    46605         1  Roberto Firmino  ...           the box  the centre      close    the boxthe centre
447                  446    46623         2  Roberto Firmino  ...           the box  the centre      close    the boxthe centre
...                  ...      ...       ...              ...  ...               ...         ...        ...                  ...
6656                6662    46870        27  Roberto Firmino  ...  very close range         N/A      close  very close rangeN/A
6666                6672    46870        27  Roberto Firmino  ...           the box   the right  not close     the boxthe right
6674                6680    46870        27  Roberto Firmino  ...           the box  the centre  not close    the boxthe centre
6676                6682    46870        27  Roberto Firmino  ...           the box    the left  not close      the boxthe left
6679                6685    46870        27  Roberto Firmino  ...   outside the box         N/A  not close   outside the boxN/A

为了清楚起见,所有可能的“位置”值是:

positions = ['a difficult anglethe left',
             'a difficult anglethe right',
             'long rangeN/A',
             'long rangethe centre',
             'long rangethe left',
             'long rangethe right',
             'outside the boxN/A',
             'penaltyN/A',
             'the boxthe centre',
             'the boxthe left',
             'the boxthe right',
             'the six yard boxthe left',
             'the six yard boxthe right',
             'very close rangeN/A']

现在我将 map 为每个“位置”名称添加以下 x/y 值,并将该值存储在新的“位置 XY”列下:

    the_boxthe_center = {'y':random.randrange(25,45), 'x':random.randrange(0,6)}
    the_boxthe_left = {'y':random.randrange(41,54), 'x':random.randrange(0,16)}
    the_boxthe_right = {'y':random.randrange(14,22), 'x':random.randrange(0,16)}
    very_close_rangeNA = {'y':random.randrange(25,43), 'x':random.randrange(0,4)}
    six_yard_boxthe_left = {'y':random.randrange(33,43), 'x':random.randrange(4,6)}
    six_yard_boxthe_right = {'y':random.randrange(25,33), 'x':random.randrange(4,6)}
    a_diffcult_anglethe_left = {'y':random.randrange(43,54), 'x':random.randrange(0,6)}
    a_diffcult_anglethe_right = {'y':random.randrange(14,25), 'x':random.randrange(0,6)}
    penaltyNA = {'y':random.randrange(36), 'x':random.randrange(8)}
    outside_the_boxNA = {'y':random.randrange(14,54), 'x':random.randrange(16,28)}
    long_rangeNA = {'y':random.randrange(0,68), 'x':random.randrange(40,52)}
    long_rangethe_centre = {'y':random.randrange(0,68), 'x':random.randrange(28,40)}
    long_rangethe_right = {'y':random.randrange(0,14), 'x':random.randrange(0,24)}
    long_rangethe_left = {'y':random.randrange(54,68), 'x':random.randrange(0,24)}

我试过了:

if df_shots['Position']=='very close rangeN/A':
        df_shots['Position X/Y']==very_close_rangeNA
...# and so on

但我得到:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我该怎么做呢?

在容器外存储这么多相关变量是一种不好的形式,让我们使用字典,我们将 map 到您的 dataframe。

data_dict = 
{'the boxthe centre': {'y':random.randrange(25,45)...}


df['Position'] = df['Position'].map(data_dict)

print(df['Position'])
6        {'y': 35, 'x': 2}
8        {'y': 32, 'x': 1}
10      {'y': 44, 'x': 11}
17       {'y': 32, 'x': 1}
447      {'y': 32, 'x': 1}
...                    NaN
6656     {'y': 35, 'x': 2}
6666    {'y': 15, 'x': 11}
6674     {'y': 32, 'x': 1}
6676    {'y': 44, 'x': 11}
6679    {'y': 37, 'x': 16}
Name: Position, dtype: object

这里有一些代码可能会达到你想要的效果。

首先创建所有“位置 XY”的列表,例如

position_xy = [the_boxthe_center,the_boxthe_left,....,long_rangethe_left] #and so on...

和通讯员positions列表(正如你已经拥有的)然后我建议你做一个字典,以便每个 position 做一个通讯员 position xy 计算

dict_positionxy = dict(zip(position, position_xy))

然后您在 dataframe 中创建一个新列,您要在其中存储基于 position 的 x、y 值

 df_shots['Position X/Y'] = 0.

现在你一一循环遍历所有行

for index, row in df_shots.iterrows():
    for key, values in dict_positionxy.items():

       if row['Position'] == key:
           #row['Position X/Y'] = value
           df_shots.at[index,’Position X/Y’]= value

print(df_shots)

这应该可以解决问题:)

这是一些完成您想要的示例代码。 我创建了一个基本的 df_shots 模型,但这应该在您较大的 DataFrame 上运行相同。 我还将其中一些自由变量存储在dict中,以使过滤更简单。

应该注意的是,因为您预先计算了positions_xy的随机值,所以每个镜头 position 的所有 x/y 值都是相同的。 这可能是也可能不是您想要的。

import pandas as pd
import random

# Sample df_shots
df_shots = pd.DataFrame({'Position': ['the_boxthe_center', 'the_boxthe_left']})

# Store position/xy pairs in dict
positions_xy = {'the_boxthe_center': {'y': random.randrange(25, 45), 'x': random.randrange(0, 6)},
                'the_boxthe_left': {'y': random.randrange(41, 54), 'x': random.randrange(0, 16)}}

# Create new column
df_shots['Position XY'] = ''

# Iterate over all position/xy pairs
for position, xy in positions_xy.items():
    # Determine indices of all players that match
    matches = df_shots['Position'] == position
    matches_indices = matches[matches].index
    # Update matching rows in df_shots with xy
    for idx in matches_indices:
        df_shots.at[idx, 'Position XY'] = xy

print(df_shots)

输出:

            Position        Position XY
0  the_boxthe_center  {'y': 36, 'x': 2}
1    the_boxthe_left  {'y': 44, 'x': 0}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM