数据框根据条件用值填充行

Question

假设我有这个数据框：

A | B | C
---------
n | b | c
n | b | c
n | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c
n | b | c
n | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c

我想用“x”填充和替换 A 列的行值。 要填充的行是“s”之前和“e”之后的行，而不是介于两者之间的行。 所以结果会是这样的：

A | B | C
---------
x | b | c
x | b | c
x | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c
x | b | c
x | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c

这是我尝试过的：

def applyFunc(s):
    if 's' in str(s):
        return 'x'
    return ''

df['A'] = df['A'].apply(applyFunc)

但这只会替换有“s”的行。

Answer 1

首先找到值在 'e' 或 's' 之后的行：

A = d['A'] # enables shorter reference to df['A']
A.where(A.isin(['e', 's'])).ffill().fillna('e')

['e', 'e', 'e', 's', 's', 's', 's', 'e', 'e', 'e', 's', 's', 's', 's', 'e']

然后找到'n'在's'之后的位置并替换为'x'：

df['new_A'] = A.mask((A.where(A.isin(['e', 's'])).ffill().fillna('e').eq('e')&A.eq('n')), 'x')

输出：

    A  B  C new_A
0   n  b  c     x
1   n  b  c     x
2   n  b  c     x
3   s  b  c     s
4   n  b  c     n
5   n  b  c     n
6   n  b  c     n
7   e  b  c     e
8   n  b  c     x
9   n  b  c     x
10  s  b  c     s
11  n  b  c     n
12  n  b  c     n
13  n  b  c     n
14  e  b  c     e

注意。 为了清楚起见，我将输出保存在一个新的列中，但真正的代码应该是df['A'] = …

Answer 2

假设组内没有重复s或e ，我们可以Series.mask s和e之间的n值。 我们可以通过比较s和e的Series.cumsum是否相等来跟踪我们是否在s和e之间：

df['A'] = df['A'].mask(
    df['A'].eq('s').cumsum().eq(df['A'].eq('e').cumsum()) & df['A'].eq('n'),
    'x'
)

df ：

    A  B  C
0   x  b  c
1   x  b  c
2   x  b  c
3   s  b  c
4   n  b  c
5   n  b  c
6   n  b  c
7   e  b  c
8   x  b  c
9   x  b  c
10  s  b  c
11  n  b  c
12  n  b  c
13  n  b  c
14  e  b  c

步骤细分为列：

# See Where S are
df['S cumsum'] = df['A'].eq('s').cumsum()
# See where E are
df['E cumsum'] = df['A'].eq('e').cumsum()
# See where S and E are the same meaning we have seen both or neither but
# not one or the other
df['S == E cumsum'] = df['S cumsum'].eq(df['E cumsum'])
# See where A is n
df['S == E cumsum AND A == n'] = df['S == E cumsum'] & df['A'].eq('n')

    A  B  C  S cumsum  E cumsum  S == E cumsum  S == E cumsum AND A == n
0   n  b  c         0         0           True                      True
1   n  b  c         0         0           True                      True
2   n  b  c         0         0           True                      True
3   s  b  c         1         0          False                     False
4   n  b  c         1         0          False                     False
5   n  b  c         1         0          False                     False
6   n  b  c         1         0          False                     False
7   e  b  c         1         1           True                     False
8   n  b  c         1         1           True                      True
9   n  b  c         1         1           True                      True
10  s  b  c         2         1          False                     False
11  n  b  c         2         1          False                     False
12  n  b  c         2         1          False                     False
13  n  b  c         2         1          False                     False
14  e  b  c         2         2           True                     False

DataFrame 和导入：

import pandas as pd

df = pd.DataFrame({
    'A': ['n', 'n', 'n', 's', 'n', 'n', 'n', 'e', 'n', 'n', 's', 'n', 'n', 'n',
          'e'],
    'B': ['b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b',
          'b'],
    'C': ['c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c',
          'c']
})

如果有重复项，我们可以过滤掉所需的起始值和结束值（ s和e ）并只取偶数组： df ：

df = pd.DataFrame({
    'A': ['n', 'n', 'n', 's', 's', 'n', 'n', 'e', 'n', 'n', 's', 'n', 'n', 'e',
          'e'],
    'B': ['b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b',
          'b'],
    'C': ['c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c',
          'c']
})

    A  B  C
0   n  b  c
1   n  b  c
2   n  b  c
3   s  b  c
4   s  b  c  # Duplicate S
5   n  b  c
6   n  b  c
7   e  b  c
8   n  b  c
9   n  b  c
10  s  b  c
11  n  b  c
12  n  b  c
13  e  b  c
14  e  b  c  # Duplicate E

查找s和e并过滤以仅保留偶数组：

s = df.loc[df['A'].isin(['s', 'e']), 'A']
df['A'] = df['A'].mask(
    ((df.index.isin(s[s.ne(s.shift())].index).cumsum() % 2) == 0)
    & df['A'].eq('n'),
    'x'
)

df ：

    A  B  C
0   x  b  c
1   x  b  c
2   x  b  c
3   s  b  c
4   s  b  c
5   n  b  c
6   n  b  c
7   e  b  c
8   x  b  c
9   x  b  c
10  s  b  c
11  n  b  c
12  n  b  c
13  e  b  c
14  e  b  c

Answer 3

解决方案1：

df1.assign(col1=(df1.A=='s').cumsum())\
    .assign(col2=(df1.A=='e').cumsum().shift().fillna(0))\
    .assign(A=lambda dd:dd.A.mask(dd.col1==dd.col2,'x'))
        

   A    B   C  s    e
0   x   b    c  0  NaN
1   x   b    c  0  0.0
2   x   b    c  0  0.0
3   s   b    c  1  0.0
4   n   b    c  1  0.0
5   n   b    c  1  0.0
6   n   b    c  1  0.0
7   e   b    c  1  0.0
8   x   b    c  1 -1.0
9   x   b    c  1 -1.0
10  s   b    c  2 -1.0
11  n   b    c  2 -1.0
12  n   b    c  2 -1.0
13  n   b    c  2 -1.0
14  e   b    c  2 -1.0

解决方案2：

def function1(dd:pd.DataFrame):
    dd.loc[:dd.query("A=='s'").index.values[-1]-1,'A']='x'
    return dd.drop('col1',axis=1)

df1.assign(col1=(df1.A=='e').cumsum().shift()).groupby('col1').apply(function1)

数据框根据条件用值填充行

问题描述

3 个解决方案

解决方案1
3 2021-08-25 17:52:31

解决方案2
2 已采纳 2021-08-25 18:31:00

解决方案3
0 2022-11-23 06:55:50

数据框根据条件用值填充行

问题描述

3 个解决方案

解决方案1 3 2021-08-25 17:52:31

解决方案2 2 已采纳 2021-08-25 18:31:00

解决方案3 0 2022-11-23 06:55:50

解决方案1
3 2021-08-25 17:52:31

解决方案2
2 已采纳 2021-08-25 18:31:00

解决方案3
0 2022-11-23 06:55:50