Alright guys, I'm stumped. To be completely honest, I'm very new to manipulating dataframes using pandas.
Suppose I have the dataframe below where the most recent entry is at the top, in descending order (I've already accomplished that part in my program based off of the data I have available).
We'll call it 'df_people' and it contains this data:
username first middle last
jschmoe joseph NaN schmoe
jdoe jane marie doe
jschmoe joseph michael schmoe
jdoe jane NaN doe
tuser test NaN user
I am trying to parse this down to only show the most recent valid data from each column based off of the 'username' column (or of course leave 'NaN' if there are no valid entries)
Expected result:
username first middle last
jschmoe joseph michael schmoe
jdoe jane marie doe
tuser test NaN user
In my actual dataframe I will have anywhere from 5-100 columns and easily over 100k rows whenever I need to run this report. While I don't expect anything to be super fast for what I'm trying to accomplish, I just wanted to give scale so you can understand how even small optimizations can make a big difference. Reliable results is always more important than having the report finish a few seconds faster! Right now I have no results...so anything is better than that...
I've tried out a ton of different combinations of things by scraping through this site and the pandas documentation, but I think my lack of knowledge on what all pandas is capable of is severely limiting here.
Any recommendations or ideas would be appreciated!
>>> df.groupby('username', as_index=False).first()
username first middle last
0 jdoe jane marie doe
1 jschmoe joseph michael schmoe
2 tuser test NaN user
You can use drop_duplicates,
df.drop_duplicates(subset='username')
Or use groupby
df.groupby('username', sort=False).first().reset_index()
username first middle last
0 jschmoe joseph michael schmoe
1 jdoe jane marie doe
2 tuser test NaN user
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.