簡體   English   中英

熊貓的to_datetime函數不會更改dtype

[英]Pandas' to_datetime function doesn't change dtype

我最近一直在使用python,但發現了一個似乎無法解決的問題。 我正在使用熊貓數據集,並且當我想使用to_datetime函數將變量的dtype從'object'更改為'datetime64'時,它不會將其更改為所需的'datetime64'dtype。

到目前為止,我只嘗試了to_datetime函數,但這似乎無法解決問題。 我正在尋找一種解決方案,以使to_datetime或任何其他代碼可以將變量的dtype從'object'更改為'datetime64'

在這里您可以找到有關數據集的信息:

df.head()
Formatted Date                      Summary  Precip Type Temperature (C)   Apparent Temperature (C)   Humidity   Wind Speed (km/h)   Wind Bearing (degrees)  Visibility (km)  Loud Cover Pressure (millibars)   Daily Summary
0   2006-04-01 00:00:00.000 +0200   Partly Cloudy   rain    9.472222    7.388889    0.89    14.1197     251.0   15.8263     0.0     1015.13     Partly cloudy throughout the day.
1   2006-04-01 01:00:00.000 +0200   Partly Cloudy   rain    9.355556    7.227778    0.86    14.2646     259.0   15.8263     0.0     1015.63     Partly cloudy throughout the day.
2   2006-04-01 02:00:00.000 +0200   Mostly Cloudy   rain    9.377778    9.377778    0.89    3.9284  204.0   14.9569     0.0     1015.94     Partly cloudy throughout the day.
3   2006-04-01 03:00:00.000 +0200   Partly Cloudy   rain    8.288889    5.944444    0.83    14.1036     269.0   15.8263     0.0     1016.41     Partly cloudy throughout the day.
4   2006-04-01 04:00:00.000 +0200   Mostly Cloudy   rain    8.755556    6.977778    0.83    11.0446     259.0   15.8263     0.0     1016.51     Partly cloudy throughout the day.

在這里,您可以在使用to_datetime函數之前看到dtypes:

df.dtypes
Formatted Date               object
Summary                      object
Precip Type                  object
Temperature (C)             float64
Apparent Temperature (C)    float64
Humidity                    float64
Wind Speed (km/h)           float64
Wind Bearing (degrees)      float64
Visibility (km)             float64
Loud Cover                  float64
Pressure (millibars)        float64
Daily Summary                object
dtype: object

在使用to_datetime函數之后:

df['Date'] = pd.to_datetime(df['Formatted Date'])
df.dtypes

Formatted Date               object
Summary                      object
Precip Type                  object
Temperature (C)             float64
Apparent Temperature (C)    float64
Humidity                    float64
Wind Speed (km/h)           float64
Wind Bearing (degrees)      float64
Visibility (km)             float64
Loud Cover                  float64
Pressure (millibars)        float64
Daily Summary                object
Date                         object
dtype: object

你能告訴我我做錯了嗎? 提前致謝!

問題

您想要將dtype值從object更改為datetime64

df = pd.DataFrame(data={'col':["2006-04-01 00:00:00.000 +0200"]})
df.dtypes

輸出:

col    object
dtype: object

要更改類型,您需要應用pd.to_datetime

df['col'] = df['col'].apply(pd.to_datetime)
df.dtypes

輸出:

col    datetime64[ns, pytz.FixedOffset(120)]
dtype: object

如果這不起作用,則您的“ Formatted Date列可能包含不一致的日期格式或NaN值。

真實數據

使用數據集( https://www.kaggle.com/budincsevity/szeged-weather/ ):

import pandas as pd

# load dataset
df = pd.read_csv('weatherHistory.csv')
df.dtypes
Formatted Date               object
Summary                      object
Precip Type                  object
Temperature (C)             float64
Apparent Temperature (C)    float64
Humidity                    float64
Wind Speed (km/h)           float64
Wind Bearing (degrees)      float64
Visibility (km)             float64
Loud Cover                  float64
Pressure (millibars)        float64
Daily Summary                object
dtype: object
df['Date'] = df['Formatted Date'].apply(pd.to_datetime)
df.dtypes
Formatted Date                      object
Summary                             object
Precip Type                         object
Temperature (C)                    float64
Apparent Temperature (C)           float64
Humidity                           float64
Wind Speed (km/h)                  float64
Wind Bearing (degrees)             float64
Visibility (km)                    float64
Loud Cover                         float64
Pressure (millibars)               float64
Daily Summary                       object
Date                        datetime64[ns]
dtype: object

我一直在用列標簽來處理熊貓和元素的問題。 我做了一個簡化的數據框版本,可以使用按索引的列位置更改列dataype。

嘗試更改您的:

 pd.to_datetime(df['Formatted Date'])

至:

  pd.to_datetime(df.iloc[0])

它為我工作:

  data=['2006-04-01 00:00:00.000 +0200']

  df = pd.DataFrame(data)

  df2 = pd.to_datetime(df.iloc[0])

  print(df2.dtypes)

輸出為:

  datetime64[ns, pytz.FixedOffset(120)]

我下載了與您使用的相同的數據,我認為這可能是您的數據集的一種可能的解決方案,只需擴展原始代碼以處理日期格式即可:

  df['Date'] = pd.to_datetime(df['Formatted Date'], format = '%Y-%m-%d %H:%M:%S.%f %p', errors= 'coerce')

如您所見,“日期”列現在具有正確的數據類型:

Formatted Date                      object
Summary                             object
Precip Type                         object
Temperature (C)                    float64
Apparent Temperature (C)           float64
Humidity                           float64
Wind Speed (km/h)                  float64
Wind Bearing (degrees)             float64
Visibility (km)                    float64
Loud Cover                         float64
Pressure (millibars)               float64
Daily Summary                       object
Date                        datetime64[ns]

對於pandas>=0.24您需要添加參數utc=True

import pandas as pd

# load dataset
df = pd.read_csv('weatherHistory.csv')

df['Date'] = df['Formatted Date'].apply(pd.to_datetime, utc=True)
df.dtypes
Formatted Date                           object
Summary                                  object
Precip Type                              object
Temperature (C)                         float64
Apparent Temperature (C)                float64
Humidity                                float64
Wind Speed (km/h)                       float64
Wind Bearing (degrees)                  float64
Visibility (km)                         float64
Loud Cover                              float64
Pressure (millibars)                    float64
Daily Summary                            object
Date                        datetime64[ns, UTC]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM