Capturing row if column string contains X and at least one of [Y,Z]

Question

My data looks something like this, with household members of three different origin (Dutch, American, French):

Household members nationality:
Dutch American Dutch French
Dutch Dutch French
American American
American Dutch
French American
Dutch Dutch

I want to convert them into three categories:

Dutch only households
Households with 1 Dutch and at least 1 French or American
Non-Dutch households

Category 1 was captured by the following code:

~df['households'].str.contains("French", "American")

I was looking for a solution for category 2 and 3. I had the following in mind:

Mixed households
df['households'].str.contains("Dutch" and ("French" or "American"))

But this solution did not work because it also captured rows containing only French members. How do I implement this 'and' statement correctly in this context?

Answer 1

Let us try str.get_dummies to create a dataframe of dummy indicator variables for the column Household , then create boolean masks m1, m2, m3 as per the specified conditions finally use these masks to filter out the rows:

c = df['Household'].str.get_dummies(sep=' ')
m1 = c['Dutch'].eq(1) & c[['American', 'French']].eq(0).all(1)
m2 = c['Dutch'].eq(1) & c[['American', 'French']].eq(1).any(1)
m3 = c['Dutch'].eq(0)

Details:

>>> c

   American  Dutch  French
0         1      1       1
1         0      1       1
2         1      0       0
3         1      1       0
4         1      0       1
5         0      1       0

>>> df[m1] # category 1

     Household
5  Dutch Dutch
    
>>> df[m2] # category 2

                     Household
0  Dutch American Dutch French
1           Dutch Dutch French
3               American Dutch

>>> df[m3] # category 3

           Household
2  American American
4    French American

Capturing row if column string contains X and at least one of [Y,Z]

Question

1 answers

solution1
2 2021-02-19 15:31:28

Capturing row if column string contains X and at least one of [Y,Z]

Question

1 answers

solution1 2 2021-02-19 15:31:28

solution1
2 2021-02-19 15:31:28