import pandas as pd
import numpy as np
print df
I'm a newbie, I used pandas to process an excel file. I have a data frame like bellow
DAT_KEY IP DATA
01-04-19 10.0.0.1 3298329
01-04-19 10.0.0.1 0
02-04-19 10.0.0.1 3298339
02-04-19 10.0.0.1 0
01-04-19 10.0.0.2 3233233
01-04-19 10.0.0.2 0
01-04-19 10.0.0.3 0
I only want to delete the row when having same IP and DAT_KEY
and DATA=0
. Don't want to delete row have DATA=0
, but DAT_KEY and IP unique.
My expected outcome:
DAT_KEY IP DATA
01-04-19 10.0.0.1 3298329
02-04-19 10.0.0.1 3298339
01-04-19 10.0.0.2 3233233
01-04-19 10.0.0.3 0
I try with drop duplicates but it not suitable with my case
df = df.drop_duplicates()
Use
groupby
- function is used to split the data into groups based on some criteria. .first()
- Compute first of group values. Ex.
df = df.groupby(['DAT_KEY','IP'],as_index=False,sort=False).first()
print(df)
O/P:
DAT_KEY IP DATA
0 01-04-19 10.0.0.1 3298329
1 02-04-19 10.0.0.1 3298339
2 01-04-19 10.0.0.2 3233233
3 01-04-19 10.0.0.3 0
Maybe that's what you need:
DAT_KEY IP DATA
0 01-04-19 10.0.0.1 3298329
1 01-04-19 10.0.0.1 0
2 02-04-19 10.0.0.1 3298339
3 02-04-19 10.0.0.1 0
4 01-04-19 10.0.0.2 3233233
5 01-04-19 10.0.0.2 0
6 01-04-19 10.0.0.3 0
7 01-04-19 10.0.0.1 99999
df.groupby(["DAT_KEY","IP"], as_index=False,sort=False).apply(lambda g: g if len(g)==1 else g[g["DATA"]!=0] ).reset_index(drop=True)
Out[94]:
DAT_KEY IP DATA
0 01-04-19 10.0.0.1 3298329
1 01-04-19 10.0.0.1 99999
2 02-04-19 10.0.0.1 3298339
3 01-04-19 10.0.0.2 3233233
4 01-04-19 10.0.0.3 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.