I have the following data frame:
time id type
2012-12-19 1 abcF1
2013-11-02 1 xF1yz
2012-12-19 1 abcF1
2012-12-18 1 abcF1
2013-11-02 1 xF1yz
2006-07-07 5 F5spo
2006-07-06 5 F5spo
2005-07-07 5 F5abc
For a given id, I need to find the max date.
For that max date I need to check the type.
I have to drop every row for the given id if the type differs from the type of the max date.
Example for target data frame:
time id type
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
2013-11-02 1 xF1yz
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
2013-11-02 1 xF1yz
2006-07-07 5 F5spo
2006-07-06 5 F5spo //kept because although the date is not max, it has the same type as the row with the max date for id 5
<deleted because for id 5 the date is not the max value and the type differs from the type of the max date for id 5>
How can I achieve this? I am new to pandas and trying to learn the proper way to use the library.
Use DataFrameGroupBy.idxmax
for get indices of max values, filter only columns id
and type
and DataFrame.merge
:
df = df.merge(df.loc[df.groupby('id')['time'].idxmax(), ['id','type']])
print (df)
time id type
0 2013-11-02 1 xF1yz
1 2013-11-02 1 xF1yz
2 2006-07-07 5 F5spo
3 2006-07-06 5 F5spo
Or use DataFrame.sort_values
with DataFrame.drop_duplicates
:
df = df.merge(df.sort_values('time').drop_duplicates('id', keep='last')[["id", "type"]])
You can sort the dataframe by time, then group by id and choose the last row in each group. That is the row with the largest date.
last_rows = df.sort_values('time').groupby('id').last()
Then merge the original dataframe with the new one:
result = df.merge(last_rows, on=["id", "type"])
# time_x id type time_y
#0 2013-11-02 1 xF1yz 2013-11-02
#1 2013-11-02 1 xF1yz 2013-11-02
#2 2006-07-07 5 F5spo 2006-07-07
#3 2006-07-06 5 F5spo 2006-07-07
If needed, drop the last duplicate column:
result.drop('time_y', axis=1, inplace=True)
Create a helper Series
using set_index
, groupby
and transform
idxmax
. Then use boolean indexing
:
# If neccessary cast to datetime dtype
# df['time'] = pd.to_datetime(df['time'])
s = df.set_index('type').groupby('id')['time'].transform('idxmax')
df[df.type == s.values]
[out]
time id type
1 2013-11-02 1 xF1yz
4 2013-11-02 1 xF1yz
5 2006-07-07 5 F5spo
6 2006-07-06 5 F5spo
import pandas as pd
df = pd.DataFrame({
'time': ['2012-12-19', '2013-11-02', '2013-12-19', '2013-12-18', '2013-11-02', '2006-07-07', '2006-07-06', '2005-07-07'],
'id': [1,1,1,1,1,5,5,5],
'type': ['abcF1', 'xF1yz', 'abcF1', 'abcF1', 'xF1yz', 'F5spo', 'F5spo', 'F5abc']
})
df['time'] = pd.to_datetime(df['time'])
def remove_non_max_date_ids(df):
max_type = df.loc[df['time'].idxmax()]['type']
print(max_type)
return df[
df['type'] != max_type
]
df.groupby('id').apply(remove_non_max_date_ids)
Create a helper function that filters out the rows that do not have the same type as the max date, then apply it to each group df based on id
The other way using duplicated .
import pandas as pd
import datetime
# if needed
df['time'] = pd.to_datetime(df['time'])
# sort values of id and time ascendingly, and tagged the duplicates
df = df.sort_values(by=['id','time'], ascending=[True,True])
df['time_max'] = df.duplicated(subset=['id'], keep='last')
# keep the max value only
df2 = df.loc[~df['time_max'],['id','type']].rename(columns={'type':'type_max'}).copy()
# merge with the original df
df = pd.merge(df, df2, on=['id'], how='left')
# get the result
df['for_drop'] = df['type']==df['type_max']
df = df.loc[df['for_drop'],:]
[out]:
df
time id type time_max type_max for_drop
3 2013-11-02 1 xF1yz True xF1yz True
4 2013-11-02 1 xF1yz False xF1yz True
6 2006-07-06 5 F5spo True F5spo True
7 2006-07-07 5 F5spo False F5spo True
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.