[英]Adding a column to a python pandas data frame based on the value of another column
[英]“Is there an pandas function for adding a new column based on certain values of another column of the data frame?”
我正在嘗試根據另一列中的時間值在數據幀中創建新列,即如果時間在06:00:00和12:00:00之間,則在早上,如果時間在12:0:00和15:00之間下午00:00,依此類推
我已經嘗試過使用for循環和if else語句,但是我的數據幀有1549293行,因此循環無法執行
import datetime
import time
times= [datetime.time(6,0,0),datetime.time(12,0,0),datetime.time(15,0,0),datetime.time(20,0,0),datetime.time(23,0,0)]
times
df['time']=df['start_time'].dt.time
df['day_interval']=df['time']
for i in range(0,df.shape[0]):
if df['time'][i] >= times[0] and df['time'][i] < times[1]:
df['day_interval'][i]= "Morning"
elif df['time'][i] >= times[1] and df['time'][i] < times[2]:
df['day_interval'][i]= "Afternoon"
elif df['time'][i] >= times[2] and df['time'][i] < times[3]:
df['day_interval'][i]= "Evening"
elif df['time'][i] >= times[3] and df['time'][i] < times[4]:
df['day_interval'][i]= "Night"
elif df['time'][i] >= times[4]:
df['day_interval'][i]= "Late Night"
if df['time'][i] < times[0]:
df['day_interval'][i]= "Early Hours"
有什么方法可以減少處理時間
使用pd.cut
注意,我在您的times
00:00:00和23:59:59中添加了兩個時間
pd.cut(s1,bins=pd.to_datetime(pd.Series(times),format='%H:%M:%S').tolist(),labels=['Early','M','A','E','N','L'])
0 Early
1 M
Name: time, dtype: category
Categories (6, object): [Early < M < A < E < N < L]
資料設定
times= [datetime.time(0,0,0),datetime.time(6,0,0),datetime.time(12,0,0),datetime.time(15,0,0),datetime.time(20,0,0),datetime.time(23,0,0),datetime.time(23,59,59)]
s1=pd.to_datetime(df.time,format='%H:%M:%S')
行循環幾乎不應該在熊貓中使用。 熊貓支持矢量化操作:
df.loc[(df['time'] >= times[0]) & (df['time'] < times[1]),
'day_interval'] = "Morning"
df.loc[(df['time'] >= times[1]) & (df['time'] < times[2]),
'day_interval'] = "Afternoon"
等等,但是使用pd.cut
更加優雅-請參閱WB的解決方案。
我將使用loc
作為選項df.between_time
將其扔在那里
df = pd.DataFrame(np.random.randn(25), index=pd.date_range('2017-08-20', '2017-08-21', freq='H'))
df.loc[df.between_time('06:00:00', '12:00:00').index, 'newCol'] = 'morning'
df.loc[df.between_time('12:00:00', '15:00:00').index, 'newCol'] = 'afternoon'
在大熊貓/麻木的土地上,大多數時候,如果您要前往foorloop,可能會有更好的方法。
不確定是否更快,但是我認為這至少更清潔一點[希望也正確嗎?]
def time_of_day(hour):
if hour < 6:
return 'Early Hours'
elif 6 <= hour < 12:
return 'Morning'
elif 12 <= hour < 15:
return 'Afternoon'
elif 15 <= hour < 20:
return 'Evening'
elif 20 <= hour < 23:
return 'Night'
else:
return 'Late Night'
def main():
# ... code that generates df ...
df['day_interval'] = df['start_time'].dt.hour.map(time_of_day)
if __name__ == '__main__':
main()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.