简体   繁体   中英

Removing Rows that contain values in a list

I am trying to filter out a large dataframe and don't want rows that contain certain values in the column 'Product Description'.

I have looked at how can i remove multiple rows with different labels in one command in pandas?

and

Remove rows not .isin('X')

and applied the code. However,

  df[-df['label'].isin(List)] 

is not working for me and I am not sure what to do.

Here is my exact code:

List2 = ['set up','setup','and install',....etc etc]

(I also tried List2 = ( ..etc ) with parentheses instead of brackets and it didn't work)

Computers_No_UNSPSC =Compters_No_UNSPSC[- Computers_No_UNSPSC['Product Description'].isin(List2)]

(I also tried using ~ instead of - which didn't work)

Is there something that I am doing wrong/missing. When I look at my Computers_No_UNSPSC dataframe, I see that there are rows still containing words in the list I created. It doesn't seem to filter out what I don't want.

Thanks for the help!

**I believe the List2 is working. I have rows of data that where people are describe their computer purchases. I want all computers bought not 'computer repair' or 'computer software'. So I created a list that seems to capture peripherals/things I don't want...well when I say

print List2 

I get

['set up', 'setup', 'and install', ' server', 'labor', 'services', 'processing', 'license', 'renewal', 'repair', 'case', 'speakers', 'cord', 'support', 'cart', 'docking station', 'components', 'accessories', 'software', ' membership', ' headsets ', ' keyboard', ' mouse', ' peripheral', ' part', ' charger', ' battery', ' drive', ' print', ' cable', ' supp', ' usb', ' shelf', 'disk', 'memory', 'studio', 'training', 'adapter', 'wiring', 'mirror']

Does this mean that it recognizes each string as a word? so when I apply the filter it will filter against each of the words in my List2?

A =A[-A['Product Description'].isin(List2)] 

This seems to be the part that isn't working but again, I am not sure where I went wrong.

I dont think you understand how that works its checking if label == anything in that list ... not if label contains anything in that list ...

It sounds like a label might look like

label = "set up computer"

isin will look for exact matches ... not partial matches

label in ["set","up","computer"] #is false for example
"set" in ["set","up","computer"] #is true for example

note : this obviously is not pandas isin but that works the same ...

to do what you want you need to check the list of words against label

any(word in label for word in blacklisted_words)

which is going to be much slower

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM