I have a dataframe df with age and I am working on categorizing the file into age groups with 0s and 1s.
df:
User_ID | Age
35435 22
45345 36
63456 18
63523 55
I tried the following
df['Age_GroupA'] = 0
df['Age_GroupA'][(df['Age'] >= 1) & (df['Age'] <= 25)] = 1
but get this error
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
To avoid it, I am going for .loc
df['Age_GroupA'] = 0
df['Age_GroupA'] = df.loc[(df['Age'] >= 1) & (df['Age'] <= 25)] = 1
However, this marks all ages as 1
This is what I get
User_ID | Age | Age_GroupA
35435 22 1
45345 36 1
63456 18 1
63523 55 1
while this is the goal
User_ID | Age | Age_GroupA
35435 22 1
45345 36 0
63456 18 1
63523 55 0
Thank you
You can convert boolean mask to int
- True
are 1
and False
are 0
:
df['Age_GroupA'] = ((df['Age'] >= 1) & (df['Age'] <= 25)).astype(int)
print (df)
User ID Age Age_GroupA
0 35435 22 1
1 45345 36 0
2 63456 18 1
3 63523 55 0
Due to peer pressure (@DSM), I feel compelled to breakdown your error:
df['Age_GroupA'][(df['Age'] >= 1) & (df['Age'] <= 25)] = 1
this is chained indexing/assignment
so what you tried next:
df['Age_GroupA'] = df.loc[(df['Age'] >= 1) & (df['Age'] <= 25)] = 1
is incorrect form, when using loc
you want:
df.loc[<boolean mask>, cols of interest] = some scalar or calculated value
like this:
df.loc[(df['Age_MDB_S'] >= 1) & (df['Age_MDB_S'] <= 25), 'Age_GroupA'] = 1
You could also have done this using np.where
:
df['Age_GroupA'] = np.where( (df['Age_MDB_S'] >= 1) & (df['Age_MDB_S'] <= 25), 1, 0)
To do this in 1 line, there are many ways to do this
This worked for me. Jezrael already explained it.
dataframe['Age_GroupA'] = ((dataframe['Age'] >= 1) & (dataframe['Age'] <= 25)).astype(int)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.