简体   繁体   中英

How to remove rows from pandas df based on values of two different columns

I'm reading in a large CSV file with flight records, and I would like to remove all of the rows that DO NOT HAVE either 'Origin_Airport_Code' or 'Destination_Airport_Code' as ORD. After that I would also like to combine the 'Year' and 'Flight Date' Columns into date time and I suppose index flights by the date time.

I'm not sure what to try since I'm new to python and pandas

data = pd.read_csv("groundhog_query.csv") 

data.columns
Index(['Year', 'Flight_Date', 'Day_Of_Year', 'Unique_Carrier_ID', 'Airline_ID',
       'Tail_Number', 'Flight_Number', 'Origin_Airport_ID', 'Origin_Market_ID',
       'Origin_Airport_Code', 'Origin_State', 'Destination_Airport_ID',
       'Destination_Market_ID', 'Destination_Airport_Code', 'Dest_State',
       'Scheduled_Dep_Time', 'Actual_Dep_Time', 'Dep_Delay', 'Pos_Dep_Delay',
       'Scheduled_Arr_Time', 'Actual_Arr_Time', 'Arr_Delay', 'Pos_Arr_Delay',
       'Combined_Arr_Delay', 'Can_Status', 'Can_Reason', 'Div_Status',
       'Scheduled_Elapsed_Time', 'Actual_Elapsed_Time', 'Carrier_Delay',
       'Weather_Delay', 'Natl_Airspace_System_Delay', 'Security_Delay',
       'Late_Aircraft_Delay', 'Div_Airport_Landings', 'Div_Landing_Status',
       'Div_Elapsed_Time', 'Div_Arrival_Delay', 'Div_Airport_1_ID',
       'Div_1_Tail_Num', 'Div_Airport_2_ID', 'Div_2_Tail_Num',
       'Div_Airport_3_ID', 'Div_3_Tail_Num', 'Div_Airport_4_ID',
       'Div_4_Tail_Num', 'Div_Airport_5_ID', 'Div_5_Tail_Num'],
      dtype='object')

This is how the columns are organized. Would I be able to do some if than statements or a loop? Thanks for the help

to filter the rows accordding to both of those columns, removing tows that have origin or destination equals to ORD:

data = data[(data['Origin_Airport_Code'] != 'ORD']) | (data['Destination_Airport_Code'] !='ORD'])]

About the group by, I did not follow what do you want as result of your group by, but here you can see how the groupby function works in pandas: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM