I'm trying to sort a pandas data frame by rows that have two specific values in any column. In the sample data below, I would want to select the rows that have a value of 'apple' AND 'grape',
a b c
0 apple orange grape
1 grape apple banana
2 pear kiwi apple
resulting in a filtered data frame that shows:
a b c
0 apple orange grape
1 grape apple banana
Using the the code below, I can select all the rows that have one specific value:
df[(df == 'orange').any(axis=1)]
The result retuned, as expected, was:
a b c
0 apple orange grape
Using the following line of code, I expected to select the rows that had both values somewhere in the row, but this returned all the rows that had either apple OR grape as a column value:
df[np.isin(df, ['apple', 'grape']).any(axis=1)]
I expected to get only the rows that had apple AND grape using the previous line, but that obviously isn't the correct way to accomplish this. How do I go about selecting rows that only have both values in any column?
Another way is to create a boolean mask:
mask=df.isin(['apple','grape']).sum(1).eq(2)
Finally:
result=df[mask]
output of result
:
a b c
0 apple orange grape
1 grape apple banana
With your shown samples and with boolean masking try following. Using .any
function of Pandas.
m1 = (df=='apple').any(1)
m2 = (df=='grape').any(1)
df[m1 & m2]
Output will be as follows:
a b c
0 apple orange grape
1 grape apple banana
One option is to "count" the number of True
s from np.isin on axis=1 using sum then compare whether it is greater than equal to the number of values that are being checked:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'a': {0: 'apple', 1: 'grape', 2: 'pear'},
'b': {0: 'orange', 1: 'apple', 2: 'kiwi'},
'c': {0: 'grape', 1: 'banana', 2: 'apple'}
})
vals = ['apple', 'grape']
filtered = df[np.isin(df, vals).sum(axis=1) >= len(vals)]
print(filtered)
Another option would be to turn the values into a set and apply on axis=1 issubset :
filtered = df[df.apply(set(vals).issubset, axis=1)]
Both give:
a b c
0 apple orange grape
1 grape apple banana
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.