簡體   English   中英

“是否存在一個熊貓函數,用於基於數據幀的另一列的某些值添加新列?”

[英]“Is there an pandas function for adding a new column based on certain values of another column of the data frame?”

我正在嘗試根據另一列中的時間值在數據幀中創建新列,即如果時間在06:00:00和12:00:00之間,則在早上,如果時間在12:0:00和15:00之間下午00:00,依此類推

我已經嘗試過使用for循環和if else語句,但是我的數據幀有1549293行,因此循環無法執行

import datetime
import time
times= [datetime.time(6,0,0),datetime.time(12,0,0),datetime.time(15,0,0),datetime.time(20,0,0),datetime.time(23,0,0)]
times

df['time']=df['start_time'].dt.time
df['day_interval']=df['time']

for i in range(0,df.shape[0]):

    if df['time'][i] >= times[0] and df['time'][i] < times[1]:
        df['day_interval'][i]= "Morning"
    elif df['time'][i] >= times[1] and df['time'][i] < times[2]:
        df['day_interval'][i]= "Afternoon"
    elif df['time'][i] >= times[2] and df['time'][i] < times[3]:
        df['day_interval'][i]= "Evening"
    elif df['time'][i] >= times[3] and df['time'][i] < times[4]:
        df['day_interval'][i]= "Night"
    elif df['time'][i] >= times[4]:
        df['day_interval'][i]= "Late Night"
    if df['time'][i] < times[0]:
        df['day_interval'][i]= "Early Hours"

有什么方法可以減少處理時間

使用pd.cut注意,我在您的times 00:00:00和23:59:59中添加了兩個時間

pd.cut(s1,bins=pd.to_datetime(pd.Series(times),format='%H:%M:%S').tolist(),labels=['Early','M','A','E','N','L'])
0    Early
1        M
Name: time, dtype: category
Categories (6, object): [Early < M < A < E < N < L]

資料設定

times= [datetime.time(0,0,0),datetime.time(6,0,0),datetime.time(12,0,0),datetime.time(15,0,0),datetime.time(20,0,0),datetime.time(23,0,0),datetime.time(23,59,59)]
s1=pd.to_datetime(df.time,format='%H:%M:%S') 

行循環幾乎不應該在熊貓中使用。 熊貓支持矢量化操作:

df.loc[(df['time'] >= times[0]) & (df['time'] < times[1]),
       'day_interval'] = "Morning"
df.loc[(df['time'] >= times[1]) & (df['time'] < times[2]),
       'day_interval'] = "Afternoon"

等等,但是使用pd.cut更加優雅-請參閱WB的解決方案。

我將使用loc作為選項df.between_time將其扔在那里

df = pd.DataFrame(np.random.randn(25), index=pd.date_range('2017-08-20', '2017-08-21', freq='H'))

df.loc[df.between_time('06:00:00', '12:00:00').index, 'newCol'] = 'morning'
df.loc[df.between_time('12:00:00', '15:00:00').index, 'newCol'] = 'afternoon'

在大熊貓/麻木的土地上,大多數時候,如果您要前往foorloop,可能會有更好的方法。

不確定是否更快,但是我認為這至少更清潔一點[希望也正確嗎?]

def time_of_day(hour):
    if hour < 6:
        return 'Early Hours'
    elif 6 <= hour < 12:
        return 'Morning'
    elif 12 <= hour < 15:
        return 'Afternoon'
    elif 15 <= hour < 20:
        return 'Evening'
    elif 20 <= hour < 23:
        return 'Night'
    else:
        return 'Late Night'


def main():
    # ... code that generates df ...
    df['day_interval'] = df['start_time'].dt.hour.map(time_of_day)


if __name__ == '__main__':
    main()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM