简体   繁体   中英

How to get the 1st occurence of a value in a column in python dataframe

I have a python pandas dataframe as:

     name     date         value
0    XYZ    01-01-2018    No Value
1    XYZ    02-01-2018    No Value
2    XYZ    03-01-2018     A
3    XYZ    04-01-2018     A
4    XYZ    05-01-2018     B
5    XYZ    06-01-2018     B
6    XYZ    07-01-2018     A

I want to get only the rows where the data in the value column occurs either A or B for the 1st time and skip the consecutive repeated values.

eg In this case, A occurs for the 1st time at index 2, then B occurs at index 4, again A occurs at index 6. In short, I want to get the rows with index 2,4 and 6.

Any help will be appreciated

It seems you may need pd.Series.notnull + pd.Series.shift :

res = df[df['value'].notnull() & (df['value'] != df['value'].shift())]

print(res)

  name        date value
2  XYZ  03-01-2018     A
4  XYZ  05-01-2018     B
6  XYZ  07-01-2018     A

isin with shift :

df.loc[(df.value.isin(['A', 'B'])) & (df.value != df.value.shift())]

  name        date value
2  XYZ  03-01-2018     A
4  XYZ  05-01-2018     B
6  XYZ  07-01-2018     A

probably not the best solution, but this should work:

import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [0, 2, 0, 1, 2]})
df.groupby("b").first()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM