My data looks something like this, with household members of three different origin (Dutch, American, French):
Household members nationality:
Dutch American Dutch French
Dutch Dutch French
American American
American Dutch
French American
Dutch Dutch
I want to convert them into three categories:
Category 1 was captured by the following code:
~df['households'].str.contains("French", "American")
I was looking for a solution for category 2 and 3. I had the following in mind:
Mixed households
df['households'].str.contains("Dutch" and ("French" or "American"))
But this solution did not work because it also captured rows containing only French members. How do I implement this 'and' statement correctly in this context?
Let us try str.get_dummies
to create a dataframe of dummy indicator variables for the column Household
, then create boolean masks m1, m2, m3
as per the specified conditions finally use these masks to filter out the rows:
c = df['Household'].str.get_dummies(sep=' ')
m1 = c['Dutch'].eq(1) & c[['American', 'French']].eq(0).all(1)
m2 = c['Dutch'].eq(1) & c[['American', 'French']].eq(1).any(1)
m3 = c['Dutch'].eq(0)
Details:
>>> c
American Dutch French
0 1 1 1
1 0 1 1
2 1 0 0
3 1 1 0
4 1 0 1
5 0 1 0
>>> df[m1] # category 1
Household
5 Dutch Dutch
>>> df[m2] # category 2
Household
0 Dutch American Dutch French
1 Dutch Dutch French
3 American Dutch
>>> df[m3] # category 3
Household
2 American American
4 French American
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.