如何在Pandas中选择行范围？

Question

我创建了一个具有许多特征的数据框。 我想创建一个新列，选择两个特定行（将作为输入）之间的所有行。

假设数据帧如下：

data = {'currency': ['Euro', 'Euro', 'Euro', 'Dollar', 'Dollar', 'Yen',
                     'Yen', 'Yen', 'Pound', 'Pound', 'Pound, 'Pesos',
                     'Pesos'], 
    'cost': [34, 67, 32, 29, 48, 123, 23, 45, 78, 86, 23, 45, 67]}
df = pd.DataFrame(data, columns = ['currency', 'cost'])
df

df表：

我想添加一个新列，在满足条件时分配1。 就我而言，条件是两种特定货币之间的所有行。 例如，假设我想要'Dollar'和'Pound'之间的所有货币。 我的猜测是我必须创建一个掩码并将其用作条件，即选择第一个'Dollar'行和最后'Pound'行（即行3-10）之间的所有行。

我在创建该掩码时遇到问题，因为按字母顺序选择货币：

mask = (df['currency'] >= 'Dollar') & (df['currency'] <= 'Pound')

上面创建了一个新的列，其中包含所有货币的T，除了'Yen'。 我可以看出上面为什么会失败，但却无法想到一种做我想做的事情。

注意：相同的货币名称将成组，例如'Pounds'不能在4-5行和8-10行。

Answer 1

通用解决方案也适用于重复索引：

a = df['currency'].eq('Dollar').cumsum()
b = df['currency'].eq('Pound').iloc[::-1].cumsum()
df['new'] = a.mul(b).clip_upper(1)

替代工作的唯一索引：

a = df['currency'].eq('Dollar').idxmax()
b = df['currency'].eq('Pound').iloc[::-1].idxmax()
df['new'] = 0
df.loc[a:b, 'new'] = 1

print (df)
   currency  cost  new
0      Euro    34    0
1      Euro    67    0
2      Euro    32    0
3    Dollar    29    1
4    Dollar    48    1
5       Yen   123    1
6       Yen    23    1
7       Yen    45    1
8     Pound    78    1
9     Pound    86    1
10    Pound    23    1
11    Pesos    45    0
12    Pesos    67    0

说明：

比较Series.eq和==第一个相同
得到cumsum
对于[::-1]第二个条件反向掩码
多个由mul组合在一起，并通过clip_upper将非0替换为1

第二种解决方案使用idxmax作为第一个索引值，并使用loc设置1

Answer 2

使用Numpy在逻辑或逻辑上的积累

cumor = np.logical_or.accumulate

c = df.currency.values
d = c == 'Dollar'
p = c == 'Pound'

df.assign(new=(cumor(d) & cumor(p[::-1])[::-1]).astype(np.uint))

   currency  cost  new
0      Euro    34    0
1      Euro    67    0
2      Euro    32    0
3    Dollar    29    1
4    Dollar    48    1
5       Yen   123    1
6       Yen    23    1
7       Yen    45    1
8     Pound    78    1
9     Pound    86    1
10    Pound    23    1
11    Pesos    45    0
12    Pesos    67    0

如何在Pandas中选择行范围？

问题描述

2 个解决方案

解决方案1
3 2018-04-17 06:23:14

解决方案2
2 2018-04-17 06:34:40

如何在Pandas中选择行范围？

问题描述

2 个解决方案

解决方案1 3 2018-04-17 06:23:14

解决方案2 2 2018-04-17 06:34:40

解决方案1
3 2018-04-17 06:23:14

解决方案2
2 2018-04-17 06:34:40