I've got a little bit of a weird situation, and I don't understand why it works in one situation and not the other.
I'm trying to cast a column on a multiindex from timedelta64[ns] to timedelta64[s], and I also have a multiindex for rows. If tuple is the column I want (level_0, level_1):
it works with df[tuple] = df[tuple].astype(timedelta64[s])
it doesn't work with df.loc[:, tuple].astype(timedelta64[s])
Here is some sample data (csv):
Level_0,,,Respondent,Respondent,Respondent,OtherCat,OtherCat
Level_1,,,Something,StartDate,EndDate,Yes/No,SomethingElse
Region,Site,RespondentID,,,,,
Region_1,Site_1,3987227376,A,5/25/2015 10:59,5/25/2015 11:22,Yes,
Region_1,Site_1,3980680971,A,5/21/2015 9:40,5/21/2015 9:52,Yes,Yes
Region_1,Site_2,3977723249,A,5/20/2015 8:27,5/20/2015 8:41,Yes,
Region_1,Site_2,3977723089,A,5/20/2015 8:33,5/20/2015 9:09,Yes,No
Load it with:
In [1]: df = pd.read_csv(header=[0,1], index_col=[0,1,2])
df
Out[1]:
I want to create a column "Duration" (and then one called "DurationMinutes" dividing Duration by 60).
I start by casting the dates to datetime:
In [2]:
df.loc[:,('Respondent','StartDate')] = pd.to_datetime(sample.loc[:,('Respondent','StartDate')])
df.loc[:,('Respondent','EndDate')] = pd.to_datetime(df.loc[:,('Respondent','EndDate')])
df.loc[:,('Respondent','Duration')] = df.loc[:,('Respondent','EndDate')] - df.loc[:,('Respondent','StartDate')]
This is where I don't understand anymore what's going on. I want to convert it to timedelta64[s] because I need that. If I simply display the result of astype('timedelta64[s]')
, it works like a charm:
In [3]: df.loc[:,('Respondent','Duration')].astype('timedelta64[s]')
Out[3]:
Region Site RespondentID
Region_1 Site_1 3987227376 1380
3980680971 720
Site_2 3977723249 840
3977723089 2160
Name: (Respondent, Duration), dtype: float64
But if I assign, then show the column, it fails:
In [4]: df.loc[:,('Respondent','Duration')] = df.loc[:,'Respondent','Duration')].astype('timedelta64[s]')
df.loc[:,('Respondent','Duration')]
Out[4]:
Region Site RespondentID
Region_1 Site_1 3987227376 00:00:00.000001
3980680971 00:00:00.000000
Site_2 3977723249 00:00:00.000000
3977723089 00:00:00.000002
Name: (Respondent, Duration), dtype: timedelta64[ns]
Weirdly enough, if I do this: it will work:
In [5]: df[('Respondent','Duration')] = df[('Respondent','Duration')].astype('timedelta64[s]')
df.loc[:,('Respondent','Duration')]
Out[5]:
Region Site RespondentID
Region_1 Site_1 3987227376 1380
3980680971 720
Site_2 3977723249 840
3977723089 2160
Name: (Respondent, Duration), dtype: float64
Another strange thing, if I filter for one site, and drop the Region so that I end up with a single-level index, it works...:
In [6]:
Survey = 'Site_1'
df = df.xs(Survey, level='Site').copy()
# Drop the 'Region' from index
df.index = df.index.droplevel(level='Region')
df.loc[:,('Respondent','StartDate')] = pd.to_datetime(df.loc[:,('Respondent','StartDate')])
df.loc[:,('Respondent','EndDate')] = pd.to_datetime(df.loc[:,('Respondent','EndDate')])
df.loc[:,('Respondent','Duration')] = df.loc[:,('Respondent','EndDate')] - df.loc[:,('Respondent','StartDate')]
# This works fine
df.loc[:,('Respondent','Duration')] = df.loc[:,('Respondent','Duration')].astype('timedelta64[s]')
# Display
df.loc[:,('Respondent','Duration')]
Out[6]:
RespondentID
3987227376 1380
3980680971 720
Name: (Respondent, Duration), dtype: float64
Clearly I'm missing something as to why df.loc[:,tuple] is different than df[tuple] .
Can someone shed some light please?
Python 2.7.9, pandas 0.16.2
This was a bug, I just fixed it here , will be in 0.17.0.
The gist is this. When you do something like df.loc[:,column] = value
this is treated exactly the same as df[[column]] = value
. This means that type coercion is independent of what the column WAS. Contrast this to df.loc[indexer,column]
, eg you are partially setting a column. Here the new value AND the existing dtype of the column matters.
The bug was that when the frame has a multi-index, even though the multi-index was a full index (eg it encompassed the full length of values in the frame) it wasn't taking the correct path.
So the bottom line is that these cases should (and will be) the same.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.