I have my data as
Age | Gender |
---|---|
2 Year(s) | Male |
3 Month(s) | Male |
2 Day (s) | Female |
And I want to make it to
Age | Gender |
---|---|
2 | Male |
Below 1 year | Male |
Below 1 year | Female |
Use the replace function. In your case:
df.Age.replace({
"2 Year(s)": "2",
"3 Month(s)": "Below 1 year",
...
},
inplace=True
)
Figured it out, I did this, I copy pasted the actual code from my code editor
#split Patient Age and append back to master list
alldata[['Age', 'Unit']] = alldata['Patient Age'].str.split(' ', expand=True)
#Separate into subdata of Years, Months, Days and Hours
dfunit = alldata.groupby(['Unit'])
dfyears = dfunit.get_group('Year(s)')
dfmonths = dfunit.get_group('Month(s)')
dfdays = dfunit.get_group('Day(s)')
dfhours = dfunit.get_group('Hour(s)')
#Replace value of Mpnths, Days and Hours with 0
dfmonths['Age'] = dfmonths['Age'].str.replace('\d', '0')
dfdays['Age'] = dfdays['Age'].str.replace('\d', '0')
dfhours['Age'] = dfhours['Age'].str.replace('\d', '0')
#Combine sublist back to master list
newdata = dfyears.append(dfmonths).append(dfdays).append(dfhours)
#Convert value in Age to int
newdata['Age'] = newdata['Age'].astype(int)
newdata['Age'].dtypes
#Separate Age into Age Group
bins = [0, 20, 30, 40, 50, 60, 70, 120]
labels = ['0-19', '20-29', '30-39', '40-49', '50-59', '60-69', '70+']
newdata['Agerange'] = pd.cut(newdata['Age'], bins, labels = labels,include_lowest = True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.