I have a dataframe called "df" and in that dataframe there is a column called "Year_Birth", instead of that column I want to create multiple columns of specific age categories and on each element in that dataframe I calculate the age using the previous Year_Birth column and then put value of "True" or "1" on the age category that the element belongs to.
I am doing this manually as you can see:
#Splitting the Year and Income Attribute to categories
from datetime import date
df_year = pd.DataFrame(columns=['18_29','30_39','40_49','50_59','60_plus'])
temp = df.Year_Birth
current_year = current_year = date.today().year
for x in temp:
l = [0,0,0,0,0]
age = current_year - x
if (age<=29): l[0] = 1
elif (age<=39): l[1] = 1
elif (age<=49): l[2] = 1
elif (age<=59): l[3] = 1
else: l[4] = 1
df_length = len(df_year)
df_year.loc[df_length] = l
if there's an automatic or simpler way to do this please tell me, anyway, Now I want to replace the "Year_Birth" column with the whole "df_year" dataframe ! Can you help me with that ?
You can definitely do this using vectorized operations on each column. You can start by creating an age column from the year of birth:
In [15]: age = date.today().year - df.year_birth
now, this can be used with boolean operators to create arrays of True/False values, which can be coerced to 0/1 with .astype(int)
:
In [20]: df_year = pd.DataFrame({
...: '18_29': (age >= 18) & (age <= 29),
...: '30_39': (age >= 30) & (age <= 39),
...: '40_49': (age >= 40) & (age <= 49),
...: '50_59': (age >= 50) & (age <= 59),
...: '60_plus': (age >= 60),
...: }).astype(int)
In [21]: df_year
Out[21]:
18_29 30_39 40_49 50_59 60_plus
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
.. ... ... ... ... ...
77 0 0 0 0 1
78 0 0 0 0 1
79 0 0 0 0 1
80 0 0 0 0 1
81 0 0 0 0 1
[82 rows x 5 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.