熊猫如何根据最后一个元素的条件保持值或更改列的值

Question

我有一个与下面的数据集完全一样的数据集，我需要做的是根据我在“status”列中的内容转换“status_final”列。 如果列“状态”不等于已实现，我需要将列 status_final 保留为最后一行的值。 Just Realized 可以更改列 status_final 的值，如果没有，我需要保留最后一个的值。 一个问题是，如果值是 1，那么在实现 0 之后，我不能像逻辑所暗示的那样将其更改为 0。 另一个问题是我正在尝试循环，因为我有超过 10k 行，所以需要太多时间。

     status       status_final
0    Nothing           1
1    Nothing           0
2    Realized          0
3     Doing            0
4    Realized          1
5    Doing             0
6    Nothing           0
7    Realized          0
8    Nothing           1

我需要将其转换为：

     status       status_final
0    Nothing           1
1    Nothing           1
2    Realized          0
3     Doing            0
4    Realized          1
5    Doing             1
6    Nothing           1
7    Realized          0
8    Nothing           1

Answer 1

咱们试试吧：

# mask `status_final` where `status` is not Realized
s = df['status_final'].where(df.status.eq('Realized'))

# override the first `nan` row
s.iloc[0] = df['status_final'].iloc[0]

# then ffill
df['status_final'] = np.maximum(df['status_final'], s.ffill())

输出：

     status  status_final
0   Nothing           1.0
1   Nothing           1.0
2  Realized           0.0
3     Doing           0.0
4  Realized           1.0
5     Doing           1.0
6   Nothing           1.0
7  Realized           0.0
8   Nothing           1.0

Answer 2

这是基于这样的假设，即在第一个realized之后，每个值都将等于前一个realized的值。 之前的所有内容都可以按status分组，并将第一个status_final向前传递。

r_row = df[df['status']=='Realized'].index.min()
df.loc[(df.index >= r_row) & (df['status']!='Realized'), 'status_final'] = np.nan
df.loc[df.index < r_row, 'status_final'] = df.loc[df.index < r_row].groupby('status')['status_final'].transform('first')
df.ffill()

输出

    status  status_final
0   Nothing          1.0
1   Nothing          1.0
2   Realized         0.0
3   Doing            0.0
4   Realized         1.0
5   Doing            1.0
6   Nothing          1.0

Answer 3

如果时间是一个问题，也许最快的是使用 numpy 数组：

import pandas as pd
import numpy as np

df = pd.DataFrame([ ['Nothing', 1],
                    ['Nothing', 0],
                    ['Realized',0],
                    ['Doing',   0],
                    ['Realized',1],
                    ['Doing',   0],
                    ['Nothing', 0]],\
                  columns=['status', 'status_final'])

arr = np.array(df.values)
for i in np.arange(1,arr.shape[0]):
    if not(arr[i][0] == 'Realized'):
        arr[i][1] = arr[i-1][1] 

df = pd.DataFrame(data=arr,\
                  columns=['status', 'status_final'])
print(df)

熊猫如何根据最后一个元素的条件保持值或更改列的值

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-10-07 18:11:07

解决方案2
0 2020-10-07 18:06:56

解决方案3
0 2020-10-07 18:49:51

熊猫如何根据最后一个元素的条件保持值或更改列的值

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-10-07 18:11:07

解决方案2 0 2020-10-07 18:06:56

解决方案3 0 2020-10-07 18:49:51

解决方案1
1 已采纳 2020-10-07 18:11:07

解决方案2
0 2020-10-07 18:06:56

解决方案3
0 2020-10-07 18:49:51