简体   繁体   English

根据列的排序更改熊猫数据框中切片的值

[英]Change values from a slice in pandas dataframe depending on the sorting of a column

I am having some trouble in modifying values in a pandas dataframe, depending on some specific sorting.根据某些特定的排序,我在修改熊猫数据框中的值时遇到了一些麻烦。

My DataFrame is like我的 DataFrame 就像

df_test = pd.DataFrame({'month':[1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3],
                        'day':[1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2],
                       'period':[np.random.choice(['a','b']) for i in range(18)],
                        'to_mark':['n']*18,
                       'value':np.random.randn(18)})

Where I have some months, days in that months, periods that is a categorical variable.我有几个月,那几个月的几天,这是一个分类变量的时期。 I also have a to_mak columns and a Value Column.我还有一个 to_mak 列和一个值列。 I want that, by slicing the moth, day and period column, when sorting for the value column, to change the value of 'to_mark' column from 'n' to 'y'.我希望通过切片 moth、day 和 period 列,在对 value 列进行排序时,将 'to_mark' 列的值从 'n' 更改为 'y'。

What I tried was:我尝试的是:

for m in df_test.month.unique():
    for d in df_test[df_test.month==m].day.unique():
        for p in df_test[(df_test.month==m) & (df_test.day==d)].period.unique():
            df_test[(df_test.month==m) & (df_test.day==d) & (df_test.period == p)].sort_values(
                by='value', ascending=False)['to_mark'] = 'y'

But it doesn't work properly, I am not getting to change the values of the 'to_mark' column.但它无法正常工作,我无法更改“to_mark”列的值。

One output example:一个输出示例:

Index month day period to_mark value
0       1    1      a       n  0.840179
1       1    1      a       n -1.349777
2       1    1      b       n  0.122197
3       1    2      a       n  0.276325
4       1    2      a       n  0.257014
5       1    2      b       n  0.351326
6       2    1      b       n -0.552867
7       2    1      a       n -0.614468
8       2    1      a       n -0.474198
9       2    2      b       n -0.439990
10      2    2      b       n  0.046202
11      2    2      b       n  1.601673
12      3    1      a       n -1.609012
13      3    1      a       n  0.382347
14      3    1      b       n  0.164228
15      3    2      a       n  0.176435
16      3    2      a       n -0.627590
17      3    2      a       n -1.834927

The desired output would be.所需的输出是。

Index month day period to_mark value
0       1    1      a       y  0.840179
1       1    1      a       n -1.349777
2       1    1      b       y  0.122197
3       1    2      a       y  0.276325
4       1    2      a       n  0.257014
5       1    2      b       y  0.351326
6       2    1      b       n -0.552867
7       2    1      a       n -0.614468
8       2    1      a       y -0.474198
9       2    2      b       n -0.439990
10      2    2      b       n  0.046202
11      2    2      b       y  1.601673
12      3    1      a       n -1.609012
13      3    1      a       y  0.382347
14      3    1      b       y  0.164228
15      3    2      a       y  0.176435
16      3    2      a       n -0.627590
17      3    2      a       n -1.834927

Thank you in advance.先感谢您。

IIUC, you just want to change the maximum for each group to 'y' ..? IIUC,您只想将每个组的最大值更改为 'y' ..?
Use idxmax , with loc to do this:使用idxmaxloc来做到这一点:

df_test = pd.DataFrame({'Index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], 'month': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3], 'day': [1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2], 'period': ['a', 'a', 'b', 'a', 'a', 'b', 'b', 'a', 'a', 'b', 'b', 'b', 'a', 'a', 'b', 'a', 'a', 'a'], 'to_mark': ['n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n'], 'value': [0.8401790000000001, -1.349777, 0.122197, 0.276325, 0.25701399999999996, 0.351326, -0.552867, -0.614468, -0.47419799999999995, -0.43998999999999994, 0.046202, 1.601673, -1.609012, 0.382347, 0.16422799999999999, 0.17643499999999998, -0.62759, -1.834927]})

df_test.loc[df_test.groupby(['month', 'day', 'period'])['value'].idxmax(), 'to_mark'] = 'y'

[out] [出去]

    Index  month  day period to_mark     value
0       0      1    1      a       y  0.840179
1       1      1    1      a       n -1.349777
2       2      1    1      b       y  0.122197
3       3      1    2      a       y  0.276325
4       4      1    2      a       n  0.257014
5       5      1    2      b       y  0.351326
6       6      2    1      b       y -0.552867
7       7      2    1      a       n -0.614468
8       8      2    1      a       y -0.474198
9       9      2    2      b       n -0.439990
10     10      2    2      b       n  0.046202
11     11      2    2      b       y  1.601673
12     12      3    1      a       n -1.609012
13     13      3    1      a       y  0.382347
14     14      3    1      b       y  0.164228
15     15      3    2      a       y  0.176435
16     16      3    2      a       n -0.627590
17     17      3    2      a       n -1.834927

update更新

To update nth largest, you could use groupby with nlargest .要更新第 n 大,您可以将groupbynlargest一起nlargest Then get the indices to update by merge .:然后通过merge获取要更新的索引。:

df_test = pd.DataFrame({'month': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3], 'day': [1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2], 'period': ['a', 'a', 'b', 'a', 'a', 'b', 'b', 'a', 'a', 'b', 'b', 'b', 'a', 'a', 'b', 'a', 'a', 'a'], 'to_mark': ['n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n'], 'value': [0.8401790000000001, -1.349777, 0.122197, 0.276325, 0.25701399999999996, 0.351326, -0.552867, -0.614468, -0.47419799999999995, -0.43998999999999994, 0.046202, 1.601673, -1.609012, 0.382347, 0.16422799999999999, 0.17643499999999998, -0.62759, -1.834927]})

n_largest = (df_test.groupby(['month', 'day', 'period'])['value'].nlargest(2).reset_index())

idx = (df_test.reset_index()
       .merge(n_largest, on=['month', 'day', 'period', 'value'],
              how='inner')['index'])

df_test.loc[idx, 'to_mark'] = 'y'

[out] [出去]

    month  day period to_mark     value
0       1    1      a       y  0.840179
1       1    1      a       y -1.349777
2       1    1      b       y  0.122197
3       1    2      a       y  0.276325
4       1    2      a       y  0.257014
5       1    2      b       y  0.351326
6       2    1      b       y -0.552867
7       2    1      a       y -0.614468
8       2    1      a       y -0.474198
9       2    2      b       n -0.439990
10      2    2      b       y  0.046202
11      2    2      b       y  1.601673
12      3    1      a       y -1.609012
13      3    1      a       y  0.382347
14      3    1      b       y  0.164228
15      3    2      a       y  0.176435
16      3    2      a       y -0.627590
17      3    2      a       n -1.834927

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM