So I want to filter a column of lists which should only contain specific items.
This my original table:
id | code |
---|---|
1 | [Hes3086, Hes3440, Hes3220] |
2 | [Hes3440, Nee8900] |
3 | [Hes1337, Hes3440] |
4 | [Nee8900, Hes3440] |
5 | [Hes1337, Nee8900] |
6 | [Hes3220, Nee8900] |
7 | [Hes3220, Nee8900, Hes3440] |
I want the rows which only have specific items in the lists: Hes3440, Nee8900, Hes3220
Which should generate the following output:
id | code |
---|---|
2 | [Hes3440, Nee8900] |
4 | [Nee8900, Hes3440] |
6 | [Hes3220, Nee8900] |
7 | [Hes3220, Nee8900, Hes3440] |
I am able to filter the dataset by making sure that at least one of the desired items is in each row, but this is not what I want.
Would appreciate any help!
thanks, M
Use issubset
in boolean indexing
with Series.map
:
L = ['Hes3440','Nee8900','Hes3220']
df = df[df.code.map(lambda x: set(x).issubset(L))]
print (df)
id code
1 2 [Hes3440, Nee8900]
3 4 [Nee8900, Hes3440]
5 6 [Hes3220, Nee8900]
6 7 [Hes3220, Nee8900, Hes3440]
List comprehension alternative:
df = df[[set(x).issubset(L) for x in df.code]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.