I have a python pandas dataframe as:
name date value
0 XYZ 01-01-2018 No Value
1 XYZ 02-01-2018 No Value
2 XYZ 03-01-2018 A
3 XYZ 04-01-2018 A
4 XYZ 05-01-2018 B
5 XYZ 06-01-2018 B
6 XYZ 07-01-2018 A
I want to get only the rows where the data in the value column occurs either A or B for the 1st time and skip the consecutive repeated values.
eg In this case, A occurs for the 1st time at index 2, then B occurs at index 4, again A occurs at index 6. In short, I want to get the rows with index 2,4 and 6.
Any help will be appreciated
It seems you may need pd.Series.notnull
+ pd.Series.shift
:
res = df[df['value'].notnull() & (df['value'] != df['value'].shift())]
print(res)
name date value
2 XYZ 03-01-2018 A
4 XYZ 05-01-2018 B
6 XYZ 07-01-2018 A
isin
with shift
:
df.loc[(df.value.isin(['A', 'B'])) & (df.value != df.value.shift())]
name date value
2 XYZ 03-01-2018 A
4 XYZ 05-01-2018 B
6 XYZ 07-01-2018 A
probably not the best solution, but this should work:
import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [0, 2, 0, 1, 2]})
df.groupby("b").first()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.