简体   繁体   English

熊猫如何根据最后一个元素的条件保持值或更改列的值

[英]pandas how to keep value or change value of column based on condition from last element

I have a dataset that looks like exactly the dataset below, what i need to do is transform the column 'status_final' based on what i have in the column 'status'.我有一个与下面的数据集完全一样的数据集,我需要做的是根据我在“status”列中的内容转换“status_final”列。 I need to keep the column status_final with the value of the last row if the column 'status' isn't equal to Realized.如果列“状态”不等于已实现,我需要将列 status_final 保留为最后一行的值。 Just Realized can change the column status_final value, if not, i need to keep the value from the last one. Just Realized 可以更改列 status_final 的值,如果没有,我需要保留最后一个的值。 One problem is that if the value is one, after a realized 0, i can't change it to 0 as the logic should suggest.一个问题是,如果值是 1,那么在实现 0 之后,我不能像逻辑所暗示的那样将其更改为 0。 The other problem is i'm trying with a loop, and as i have more than 10k rows, it takes too much time.另一个问题是我正在尝试循环,因为我有超过 10k 行,所以需要太多时间。

     status       status_final
0    Nothing           1
1    Nothing           0
2    Realized          0
3     Doing            0
4    Realized          1
5    Doing             0
6    Nothing           0
7    Realized          0
8    Nothing           1

And i need to transform it to:我需要将其转换为:

     status       status_final
0    Nothing           1
1    Nothing           1
2    Realized          0
3     Doing            0
4    Realized          1
5    Doing             1
6    Nothing           1
7    Realized          0
8    Nothing           1

Let's try:咱们试试吧:

# mask `status_final` where `status` is not Realized
s = df['status_final'].where(df.status.eq('Realized'))

# override the first `nan` row
s.iloc[0] = df['status_final'].iloc[0]

# then ffill
df['status_final'] = np.maximum(df['status_final'], s.ffill())

Output:输出:

     status  status_final
0   Nothing           1.0
1   Nothing           1.0
2  Realized           0.0
3     Doing           0.0
4  Realized           1.0
5     Doing           1.0
6   Nothing           1.0
7  Realized           0.0
8   Nothing           1.0

This operates off of the assumption that after the first realized every value will be equal to the value of the preceding realized .这是基于这样的假设,即在第一个realized之后,每个值都将等于前一个realized的值。 Everything before that can be grouped by status and have the first status_final passed forward.之前的所有内容都可以按status分组,并将第一个status_final向前传递。

r_row = df[df['status']=='Realized'].index.min()
df.loc[(df.index >= r_row) & (df['status']!='Realized'), 'status_final'] = np.nan
df.loc[df.index < r_row, 'status_final'] = df.loc[df.index < r_row].groupby('status')['status_final'].transform('first')
df.ffill()

Output输出

    status  status_final
0   Nothing          1.0
1   Nothing          1.0
2   Realized         0.0
3   Doing            0.0
4   Realized         1.0
5   Doing            1.0
6   Nothing          1.0

If time is an issue, maybe the fastest is to work with numpy arrays:如果时间是一个问题,也许最快的是使用 numpy 数组:

import pandas as pd
import numpy as np

df = pd.DataFrame([ ['Nothing', 1],
                    ['Nothing', 0],
                    ['Realized',0],
                    ['Doing',   0],
                    ['Realized',1],
                    ['Doing',   0],
                    ['Nothing', 0]],\
                  columns=['status', 'status_final'])

arr = np.array(df.values)
for i in np.arange(1,arr.shape[0]):
    if not(arr[i][0] == 'Realized'):
        arr[i][1] = arr[i-1][1] 

df = pd.DataFrame(data=arr,\
                  columns=['status', 'status_final'])
print(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM