简体   繁体   English

如何仅在选定的熊猫数据框的行和列上应用功能?

[英]How to apply a function only on selected rows and columns of pandas data frame?

I have a following data frame: 我有以下数据框:

       id        subid        a
    1  1         1            2 
    2  1         1            10 
    3  1         1            20
    4  1         2            30
    5  1         2            35 
    6  1         2            36 
    7  1         2            40
    8  2         2            20
    9  2         2            29
    10 2         2            30

I want to apply say for example pandas diff() function on column "a", but the function should be reapplied whenever either "id" or "subid" is being changed, and want to store the values in a new column. 我想在列“ a”上应用例如pandas diff()函数,但是无论何时更改“ id”或“ subid”,都应重新应用该函数,并希望将值存储在新列中。

Below is the df I expect. 以下是我期望的df。

        id        subid        a      difference
    1  1         1            2       NaN
    2  1         1            10      8
    3  1         1            20      10
    4  1         2            30      NaN
    5  1         2            35      5
    6  1         2            36      1
    7  1         2            40      4
    8  2         2            20      NaN
    9  2         2            29      9
    10 2         2            30      1

As it can be observed at Row-4, and Row-8 either "id" or "subid" is changing, so NaN values are present and diff is calculated in successive rows. 从第4行和第8行可以看出,“ id”或“ subid”正在变化,因此存在NaN值,并且在连续的行中计算了diff。

Have used 用过的

    df["difference"] = df["a"].diff()

which is obviously applied to the whole column, and not the way expected. 这显然适用于整个专栏,而不是预期的方式。 I have tried using groupby, but it's somehow giving extra rows. 我试过使用groupby,但是以某种方式提供了额外的行。

Thanks for any suggestions in advance. 感谢您提前提出任何建议。

try this: 尝试这个:

In [97]: df['difference'] = df.groupby(['id','subid'])['a'].diff()

In [98]: df
Out[98]:
    id  subid   a  difference
1    1      1   2         NaN
2    1      1  10         8.0
3    1      1  20        10.0
4    1      2  30         NaN
5    1      2  35         5.0
6    1      2  36         1.0
7    1      2  40         4.0
8    2      1  20         NaN
9    2      1  29         9.0
10   2      1  30         1.0

This is a tricky one. 这是一个棘手的问题。 According to your exact wording, you want to reset at every point in which either 'id' or 'subid' change. 根据您的确切措辞,您想在'id''subid'发生变化的每个点重设。 That means even if they change back and forth. 这意味着即使它们来回改变。

Also, the diff calculation doesn't make a difference if done within a groupby context, so I'll calculate it up front and mask when things change. 另外,如果在groupby上下文中完成diff计算, diff计算也不会产生任何影响,因此,我将在发生变化时groupby计算并掩盖。

i = df.id.values
s = df.subid.values
i_chg = np.append(False, i[:-1] != i[1:])
s_chg = np.append(False, s[:-1] != s[1:])

df.assign(difference=df.a.diff().mask(i_chg | s_chg))

    id  subid   a  difference
1    1      1   2         NaN
2    1      1  10         8.0
3    1      1  20        10.0
4    1      2  30         NaN
5    1      2  35         5.0
6    1      2  36         1.0
7    1      2  40         4.0
8    2      1  20         NaN
9    2      1  29         9.0
10   2      1  30         1.0

Setup 设定

df = pd.DataFrame({'a': {1: 2, 2: 10, 3: 20, 4: 30, 5: 35, 6: 36, 7: 40, 8: 20, 9: 29, 10: 30},
 'id': {1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 2, 9: 2, 10: 2},
 'subid': {1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 2, 8: 1, 9: 1, 10: 1}})

Solution

#Check for each row if the id-subid pair has changed with previous row and then calculate diff accordingly    
df['difference'] = df.apply(lambda x: x.a - df.ix[x.name-1].a 
  if (x.name>1 and x[['id','subid']].equals(df.ix[x.name-1][['id','subid']])) 
  else np.nan, axis=1)

df
Out[368]: 
     a  id  subid  difference
1    2   1      1         NaN
2   10   1      1         8.0
3   20   1      1        10.0
4   30   1      2         NaN
5   35   1      2         5.0
6   36   1      2         1.0
7   40   1      2         4.0
8   20   2      1         NaN
9   29   2      1         9.0
10  30   2      1         1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM