I have a flights dataset containing "UNIQUE_CARRIER_NAME", "MONTH_YEAR", "ROUTE" and other attributes such as passenger count, etc. which are not relevant to me in this case. Here is a sample (There are many other carriers and date ranges to 2017):
UNIQUE_CARRIER_NAME MONTH_YEAR ROUTE
2512 ATA Airlines d/b/a ATA 2-1990 OGG-HNL
2648 ATA Airlines d/b/a ATA 2-1990 IND-RSW
2649 ATA Airlines d/b/a ATA 2-1990 IND-RSW
2650 ATA Airlines d/b/a ATA 2-1990 IND-RSW
3104 ATA Airlines d/b/a ATA 2-1990 HNL-SFO
3470 ATA Airlines d/b/a ATA 2-1990 SFO-HNL
3482 ATA Airlines d/b/a ATA 2-1990 SFO-OGG
4522 ATA Airlines d/b/a ATA 3-1990 OGG-HNL
5076 ATA Airlines d/b/a ATA 2-1990 RSW-IND
5077 ATA Airlines d/b/a ATA 2-1990 RSW-IND
5078 ATA Airlines d/b/a ATA 2-1990 RSW-IND
5296 ATA Airlines d/b/a ATA 3-1990 RSW-IND
5297 ATA Airlines d/b/a ATA 3-1990 RSW-IND
5371 ATA Airlines d/b/a ATA 3-1990 SFO-HNL
5389 ATA Airlines d/b/a ATA 3-1990 SFO-OGG
....
I want to be able to groupby "UNIQUE_CARRIER_NAME", "MONTH_YEAR", "ROUTE" in this sequence in Python. I have written this:
carrier_groups = df.groupby(["UNIQUE_CARRIER_NAME","MONTH_YEAR","ROUTE])
This returns me a DataFrameGroupBy object which I can use for iterating to perform some calculations on route data -- is there anyway I can choose not to aggregate the data (for the rest of the columns) and just select the unique routes in this groupby function? These 3 rows should be only selected as 1.
2648 ATA Airlines d/b/a ATA 2-1990 IND-RSW
2649 ATA Airlines d/b/a ATA 2-1990 IND-RSW
2650 ATA Airlines d/b/a ATA 2-1990 IND-RSW
I would like to iterate this set of DataFrame grouped by "UNIQUE_CARRIER_NAME", "MONTH_YEAR" such that I have :
for each group of DataFrame:
I have a subset of df which I can run a function on ROUTE to get some results
No grouping is necessary. Just drop the dupes in the dataframe using:
df = df.drop_duplicates(subset=['UNIQUE_CARRIER_NAME','MONTH_YEAR','ROUTE'])
I think you need drop_duplicates
first and then apply
your function (only some sample function, because no information about it):
def func(x):
print (x)
#apply your function
#some sample function
x['ROUTE'] = x['ROUTE'] + 'a'
return x
df = df.drop_duplicates(['UNIQUE_CARRIER_NAME','MONTH_YEAR','ROUTE'])
df = df.apply(func, axis=1)
print (df)
UNIQUE_CARRIER_NAME MONTH_YEAR ROUTE
2512 ATA Airlines d/b/a ATA 2-1990 OGG-HNLa
2648 ATA Airlines d/b/a ATA 2-1990 IND-RSWa
3104 ATA Airlines d/b/a ATA 2-1990 HNL-SFOa
3470 ATA Airlines d/b/a ATA 2-1990 SFO-HNLa
3482 ATA Airlines d/b/a ATA 2-1990 SFO-OGGa
4522 ATA Airlines d/b/a ATA 3-1990 OGG-HNLa
5076 ATA Airlines d/b/a ATA 2-1990 RSW-INDa
5296 ATA Airlines d/b/a ATA 3-1990 RSW-INDa
5371 ATA Airlines d/b/a ATA 3-1990 SFO-HNLa
5389 ATA Airlines d/b/a ATA 3-1990 SFO-OGGa
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.