I have a df column "days" of 1000 row of records.
If the days less than 7.0 days (0-7) group as "1-6 days"
If the days more than 7.1 but less than 14.0 days (7.1 - 14.0) group as "7-14 days"
If the days more or equal to 15 days group as "> 14 days"
How can i create a new column "Days_Group" to represent the days grouping?
e.g of days values:
1 3.0
2 4.6
3 14.9
4 7.1
5 15.1
6 109
np.searchsorted
labels = np.array(['1-6 days', '7-14 days', '>14 days'])
bins = np.array([7, 14])
df.assign(Day_Group=labels[bins.searchsorted(df.days)])
days Day_Group
1 3.0 1-6 days
2 4.6 1-6 days
3 14.9 >14 days
4 7.1 7-14 days
5 15.1 >14 days
6 109.0 >14 days
Use pd.cut
df.assign(Day_Group=pd.cut(df['Days'],
[0,7,14,np.inf],
labels=['1-6 days','7-14 days','> 14 days']))
Output:
Days Day_Group
1 3.0 1-6 days
2 4.6 1-6 days
3 14.9 > 14 days
4 7.1 7-14 days
5 15.1 > 14 days
6 109.0 > 14 days
I think need cut
:
import numpy as np
df['Days_Group'] = pd.cut(df['days'],
bins=[0,7,14,np.inf],
labels=['1-6 days','7-14 days','> 14 days'],
include_lowest=True)
print (df)
days Days_Group
1 3.0 1-6 days
2 4.6 1-6 days
3 14.9 > 14 days
4 7.1 7-14 days
5 15.1 > 14 days
6 109.0 > 14 days
df['Days_Group'] = pd.cut(df['days'],
bins=[0,7,14, pd.np.inf],
labels=['1-6 days','7-14 days','> 14 days'],
include_lowest=True)
print (df)
days Days_Group
1 3.0 1-6 days
2 4.6 1-6 days
3 14.9 > 14 days
4 7.1 7-14 days
5 15.1 > 14 days
6 109.0 > 14 days
EDIT: If timedeltas in days
:
print (df)
days
1 3 days 00:00:00
2 4 days 14:24:00
3 14 days 21:36:00
4 7 days 02:24:00
5 15 days 02:24:00
6 109 days 00:00:00
df['days'] = df['days'].dt.total_seconds() / 24 / 3600
print (df)
days
1 3.0
2 4.6
3 14.9
4 7.1
5 15.1
6 109.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.