使用来自另一个数据帧的条件组填充pandas数据帧

Question

I have two dataframes, one being a list of teams and scores sorted by date, and the second individual players with the date. 我有两个数据框，一个是按日期排序的团队和分数列表，另一个是具有日期的第二个单独的玩家。 I have 60 columns of matching stats in these dataframes, and I am trying to have a code that replaces the values in each column in df2 with the average the opponent had in the dates previous from df1: 我在这些数据框中有60列匹配的统计数据，我试图用一个代码替换df2中每列中的值，以及对手在df1之前的日期中的平均值：

df1:                         df2:
   date        team  scr        name           team opp  date        scr
0  2016-04-03  KCR   5.70    0  Erasmo Ramirez TBR  TOR  2016-04-06  7.90
1  2016-04-03  NYM   4.70    1  Erasmo Ramirez TBR  BAL  2016-04-10  1.30
2  2016-04-03  PIT   6.30    2  Erasmo Ramirez TBR  CLE  2016-04-13  9.30
3  2016-04-03  STL   3.40    etc...
4  2016-04-03  TBR   4.80
5  2016-04-03  TOR   6.20*
6  2016-04-04  ARI   7.40
7  2016-04-04  ATL   5.30
8  2016-04-04  BAL   7.00
9  2016-04-04  CHC   9.60
10 2016-04-04  TOR   7.50*
etc...

So in this example the first entry under 'scr' in df2 would be changed from 7.90 to 6.85 as that is the average scr for TOR for the dates leading up to 4-6 (4-3 and 4-4) 因此，在此示例中，df2中'scr'下的第一个条目将从7.90更改为6.85，因为这是TOR的平均scr，其中日期为4-6（4-3和4-4）

I tried the following (and other similar options) and had no luck: 我尝试了以下（和其他类似的选项），没有运气：

jf = df1.groupby('team')
df2['scr'] = jf.apply(lambda x: x[(df1['date']<x['date'])&(df1['team']==x['opp'])]['scr'].sum())

ValueError: Series lengths must match to compare

Any solutions? 有解决方案吗 And also is there a possible way to iterate over all the columns with just one block of code, or do I have to have code for each column? 还有一种方法可以用一个代码块迭代所有列，或者我是否必须为每列提供代码？

Answer 1

I found one solution, perhaps not very elegant, and I'll check back in case anyone actually sees this and has a better one. 我找到了一个解决方案，也许不是很优雅，我会检查一下，以防任何人真正看到这个并且有更好的解决方案。 I created a list of the column headers, then used iterrows to iterate through both each column and row in the column: 我创建了一个列标题列表，然后使用iterrows迭代列中的每一列和每行：

    batcol = pd.DataFrame(df1.columns)
    batcol = batcol.iloc[6:-1]
    batcol = batcol.reset_index(drop=True)
    for index, row in batcol.iterrows():
        for i, rows in df2.iterrows():
            df2.loc[i, row] = df1[((df1['date'] < rows['date']) & (df1['team'] == rows['opp'])) == True].mean()[row].sum()

使用来自另一个数据帧的条件组填充pandas数据帧

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-07-01 09:03:36

使用来自另一个数据帧的条件组填充pandas数据帧

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-07-01 09:03:36

解决方案1
0 已采纳 2016-07-01 09:03:36