简体   繁体   中英

Selecting rows based on criteria from multiple columns in a DataFrame

I would like to select rows based on the following conditions:

  • If a product costs less than $10 and is larger than 20cm (and less than 30cm).
  • If a product costs between $10 - $25 and is larger than 30cm (and less than 40cm).
  • If a product costs more than $25 and is larger than 40cm.

The code below is what I have written. However, I keep getting "'<' not supported between instances of 'list' and 'int'"

Could someone please advise? Thanks!

df_1 = df([([“price"] < 10) & (["size"] > 20)], [[([“price"] > 10) & (["price"] < 25)] & (["size"] > 30)], [([“price"] > 25) & (["size"] > 40)])

You may apply boolean masks to your dataframe

condition_1 = (df['price'] < 10) &  (df["size"] > 20)
condition_2 = (df['price'] > 10) &  (df["size"] > 30)
condition_3 = (df['price'] > 25) &  (df["size"] > 40)

mask = condition_1  & condition_2 & condition_3

filtered_df = df[mask] 

Try break it up into a few sections so it is easier to see where you are going wrong.

If I was attempting this I would use the.loc function, which basically extracts whatever you put inside it (can be the label of a column or in this case a logical test).

Try this:

df1 = df.loc[(df["price"] < 10 & df["size"] > 20)]

and repeat for the others. Then you can merge the dataframes using pd.concat()

Hope that helps - I am new to python too!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM