简体   繁体   English

删除 pandas dataframe 在最大索引处的某个值之后的行

[英]Remove rows in pandas dataframe after a certain value at max index

I have a pandas dataframe with rate look like below:我有一个 pandas dataframe,费率如下所示:

import numpy as np
import pandas as pd

num = np.repeat(12, 3)
num1 = np.repeat(11, 3)
num2 = np.repeat(7, 2)
num3 = np.repeat(10, 2)
num4 = np.repeat(7, 3)
num5 = np.repeat(9, 5)
num6 = np.repeat(3, 4)
num7 = np.repeat(7, 4)

df = pd.DataFrame(columns= ['rate'])
df['rate'] = num
df = pd.concat([df, pd.DataFrame(num1, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num2, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num3, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num4, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num5, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num6, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num7, columns=['rate'])])
df = df.reset_index(drop = True)
values = (7,9)

There can be more 7s or 9s.可以有更多的 7 或 9。 I would like to delete 2 rows after the end points (max index) of each run of 7 or 9. The expected result would look like below:我想在每次运行 7 或 9 的终点(最大索引)之后删除 2 行。预期结果如下所示:

num = np.repeat(12, 3)
num1 = np.repeat(11, 3)
num2 = np.repeat(7, 2)
num3 = np.repeat(7, 3)
num4 = np.repeat(9, 3)
num5 = np.repeat(3, 2)
num6 = np.repeat(7, 4)

dd = pd.DataFrame(columns= ['rate'])
dd['rate'] = num
dd = pd.concat([dd, pd.DataFrame(num1, columns=['rate'])])
dd = pd.concat([dd, pd.DataFrame(num2, columns=['rate'])])
dd = pd.concat([dd, pd.DataFrame(num3, columns=['rate'])])
dd = pd.concat([dd, pd.DataFrame(num4, columns=['rate'])])
dd = pd.concat([dd, pd.DataFrame(num5, columns=['rate'])])
dd = pd.concat([dd, pd.DataFrame(num6, columns=['rate'])])
dd = dd.reset_index(drop = True)

Any suggestion how can I do that?有什么建议我该怎么做? Thank you for your time and effort!感谢您的时间和精力!

Here is one way to do it using Pandas shift method:下面是使用 Pandas 移位方法的一种方法:

# Setup
max_indices = df[(df["rate"] != df["rate"].shift(-1)) & (df["rate"].isin([7, 9]))].index
index = df.index.to_list()
new_index = []
start = 0

# Build new index
for idx in max_indices:
    new_index = new_index + index[start: idx + 1]
    start = idx + 3

dd = df.loc[new_index, :].reset_index(drop=True)

Then:然后:

print(dd)
# Output
    rate
0     12
1     12
2     12
3     11
4     11
5     11
6      7
7      7
8      7
9      7
10     7
11     9
12     9
13     9
14     3
15     3
16     7
17     7
18     7
19     7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM