簡體   English   中英

Python:將pandas數據框中的系列對象列轉換為int64 dtype

[英]Python: convert series object columns in pandas dataframe to int64 dtype

我有以下數據框:

Day_Part     Start_Time    End_Time   
Breakfast    9:00          11:00
Lunch        12:00         14:00
Dinner       19:00         23:00

Start_Time和End_time列現在是“系列對象”。 我想將這些列中的值轉換為int64 dtype。

這就是我希望數據框看起來像的樣子:

Day_Part     Start_Time    End_Time   
Breakfast    9             11
Lunch        12            14
Dinner       19            23

*任何幫助是極大的贊賞。

您可以先轉換為to_timedelta ,然后提取hour

df['Start_Time'] = pd.to_timedelta(df['Start_Time']+ ':00').dt.components.hours
df['End_Time'] = pd.to_timedelta(df['End_Time']+ ':00').dt.components.hours
print (df)

    Day_Part  Start_Time  End_Time
0  Breakfast           9        11
1      Lunch          12        14
2     Dinner          19        23

使用split並強制轉換為int另一個解決方案:

df['Start_Time'] = df['Start_Time'].str.split(':').str[0].astype(int)
df['End_Time'] = df['End_Time'].str.split(':').str[0].astype(int)
print (df)

    Day_Part  Start_Time  End_Time
0  Breakfast           9        11
1      Lunch          12        14
2     Dinner          19        23

用解法extract並轉換為int

df['Start_Time'] = df['Start_Time'].str.extract('(\d*):', expand=False).astype(int)
df['End_Time'] =  df['End_Time'].str.extract('(\d*):', expand=False).astype(int)
print (df)

    Day_Part  Start_Time  End_Time
0  Breakfast           9        11
1      Lunch          12        14
2     Dinner          19        23

轉換為to_datetime解決方案:

df['Start_Time'] = pd.to_datetime(df['Start_Time'], format='%H:%M').dt.hour
df['End_Time'] = pd.to_datetime(df['End_Time'], format='%H:%M').dt.hour
print (df)
    Day_Part  Start_Time  End_Time
0  Breakfast           9        11
1      Lunch          12        14
2     Dinner          19        23

時間

#[300000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
print (df)

In [158]: %timeit pd.to_timedelta(df['Start_Time']+ ':00').dt.components.hours
1 loop, best of 3: 7.12 s per loop

In [159]: %timeit df['Start_Time'].str.split(':').str[0].astype(int)
1 loop, best of 3: 415 ms per loop

In [160]: %timeit df['Start_Time'].str.extract('(\d*):', expand=False).astype(int)
1 loop, best of 3: 654 ms per loop

In [166]: %timeit pd.to_datetime(df['Start_Time'], format='%H:%M').dt.hour
1 loop, best of 3: 1.26 s per loop

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM