简体   繁体   English

如何在循环中过滤多个数据帧?

[英]How to filter multiple dataframes in a loop?

I have a lot of dataframes and I would like to apply the same filter to all of them without having to copy paste the filter condition every time.我有很多数据框,我想对所有数据框应用相同的过滤器,而不必每次都复制粘贴过滤器条件。

This is my code so far:到目前为止,这是我的代码:

df_list_2019 = [df_spain_2019,df_amsterdam_2019, df_venice_2019, df_sicily_2019]

for data in df_list_2019:
    data = data[['host_since','host_response_time','host_response_rate',
             'host_acceptance_rate','host_is_superhost','host_total_listings_count',
              'host_has_profile_pic','host_identity_verified',
             'neighbourhood','neighbourhood_cleansed','zipcode','latitude','longitude','property_type','room_type',
             'accommodates','bathrooms','bedrooms','beds','amenities','price','weekly_price',
             'monthly_price','cleaning_fee','guests_included','extra_people','minimum_nights','maximum_nights',
             'minimum_nights_avg_ntm','has_availability','availability_30','availability_60','availability_90',
              'availability_365','number_of_reviews','number_of_reviews_ltm','review_scores_rating',
              'review_scores_checkin','review_scores_communication','review_scores_location', 'review_scores_value',
              'instant_bookable','is_business_travel_ready','cancellation_policy','reviews_per_month'
             ]]

but it doesn't apply the filter to the data frame.但它不会将过滤器应用于数据框。 How can I change the code to do that?我怎样才能改变代码来做到这一点?

Thank you谢谢

The filter (column selection) is actually applied to every DataFrame, you just throw the result away by overriding what the name data points to.过滤器(列选择)实际上应用于每个 DataFrame,您只需通过覆盖名称data指向的内容来丢弃结果。

You need to store the results somewhere, a list for example.您需要将结果存储在某处,例如列表。

cols = ['host_since','host_response_time', ...]
filtered = [df[cols] for df in df_list_2019]

As soon as you write var = new_value , you do not change the original object but have the variable refering a new object.一旦你写了var = new_value ,你就不会改变原来的 object 而是让变量引用一个新的 object。

If you want to change the dataframes from df_list_2019 , you have to use an inplace=True method.如果要从df_list_2019更改数据帧,则必须使用 inplace inplace=True方法。 Here, you could use drop :在这里,您可以使用drop

keep = set(['host_since','host_response_time','host_response_rate',
             'host_acceptance_rate','host_is_superhost','host_total_listings_count',
              'host_has_profile_pic','host_identity_verified',
             'neighbourhood','neighbourhood_cleansed','zipcode','latitude','longitude','property_type','room_type',
             'accommodates','bathrooms','bedrooms','beds','amenities','price','weekly_price',
             'monthly_price','cleaning_fee','guests_included','extra_people','minimum_nights','maximum_nights',
             'minimum_nights_avg_ntm','has_availability','availability_30','availability_60','availability_90',
              'availability_365','number_of_reviews','number_of_reviews_ltm','review_scores_rating',
              'review_scores_checkin','review_scores_communication','review_scores_location', 'review_scores_value',
              'instant_bookable','is_business_travel_ready','cancellation_policy','reviews_per_month'
             ])

for data in df_list_2019:
    data.drop(columns=[col for col in data.columns if col not in keep], inplace=True)

But beware, pandas experts recommend to prefere the df = df. ...但请注意,pandas 专家建议首选df = df. ... df = df. ... idiom to the df...(..., inplace=True) because it allows chaining the operations. df = df. ... df...(..., inplace=True)的成语,因为它允许链接操作。 So you should ask yourself if @timgeb's answer cannot be used.因此,您应该问自己是否不能使用@timgeb 的答案 Anyway this one should work for your requirements.无论如何,这应该可以满足您的要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM