[英]Remove rows in pandas dataframe after a certain value at max index
I have a pandas dataframe with rate look like below:我有一个 pandas dataframe,费率如下所示:
import numpy as np
import pandas as pd
num = np.repeat(12, 3)
num1 = np.repeat(11, 3)
num2 = np.repeat(7, 2)
num3 = np.repeat(10, 2)
num4 = np.repeat(7, 3)
num5 = np.repeat(9, 5)
num6 = np.repeat(3, 4)
num7 = np.repeat(7, 4)
df = pd.DataFrame(columns= ['rate'])
df['rate'] = num
df = pd.concat([df, pd.DataFrame(num1, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num2, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num3, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num4, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num5, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num6, columns=['rate'])])
df = pd.concat([df, pd.DataFrame(num7, columns=['rate'])])
df = df.reset_index(drop = True)
values = (7,9)
There can be more 7s or 9s.可以有更多的 7 或 9。 I would like to delete 2 rows after the end points (max index) of each run of 7 or 9. The expected result would look like below:
我想在每次运行 7 或 9 的终点(最大索引)之后删除 2 行。预期结果如下所示:
num = np.repeat(12, 3)
num1 = np.repeat(11, 3)
num2 = np.repeat(7, 2)
num3 = np.repeat(7, 3)
num4 = np.repeat(9, 3)
num5 = np.repeat(3, 2)
num6 = np.repeat(7, 4)
dd = pd.DataFrame(columns= ['rate'])
dd['rate'] = num
dd = pd.concat([dd, pd.DataFrame(num1, columns=['rate'])])
dd = pd.concat([dd, pd.DataFrame(num2, columns=['rate'])])
dd = pd.concat([dd, pd.DataFrame(num3, columns=['rate'])])
dd = pd.concat([dd, pd.DataFrame(num4, columns=['rate'])])
dd = pd.concat([dd, pd.DataFrame(num5, columns=['rate'])])
dd = pd.concat([dd, pd.DataFrame(num6, columns=['rate'])])
dd = dd.reset_index(drop = True)
Any suggestion how can I do that?有什么建议我该怎么做? Thank you for your time and effort!
感谢您的时间和精力!
Here is one way to do it using Pandas shift method:下面是使用 Pandas 移位方法的一种方法:
# Setup
max_indices = df[(df["rate"] != df["rate"].shift(-1)) & (df["rate"].isin([7, 9]))].index
index = df.index.to_list()
new_index = []
start = 0
# Build new index
for idx in max_indices:
new_index = new_index + index[start: idx + 1]
start = idx + 3
dd = df.loc[new_index, :].reset_index(drop=True)
Then:然后:
print(dd)
# Output
rate
0 12
1 12
2 12
3 11
4 11
5 11
6 7
7 7
8 7
9 7
10 7
11 9
12 9
13 9
14 3
15 3
16 7
17 7
18 7
19 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.