简体   繁体   中英

How to stop pandas trying to convert strings to floats?

I am reading an excel file and want to depricate a datetime column to the 1st of each month. The deprication works fine but pandas try to covert the strings to floats and throws an error when adding it as a coulmn of an existing dataframe.

How can I disable this, and just get a column with type of string or date?

I have tried varies mapping / type casting with no effect (same error). If I convert to a proxy int, the type casting problem disappear (since it can convert it to float) but it is a ugly workaround rather than solve the real problem.

Code snippet illustrating the problem

df = pd.read_excel(file_name, skiprows=[1], skip_footer=1)

print(df['Purch.Date'].dtype)
>>> datetime64[ns]

print(df['Purch.Date'].head())
>>> 0   2016-06-23
>>> 1   2016-06-09
>>> 2   2016-06-24
>>> 3   2016-06-24
>>> 4   2016-06-24


df['YearMonthCapture'] = df['Purch.Date'].map(lambda x: str(x.replace(day=1).date()) ).astype(str)

>>> ValueError: could not convert string to float: '2016-06-01'

# === Other approached resulting in same error ===
#df['YearMonthCapture'] = df['Purch.Date'].map(lambda x: x.replace(day=1)) 
#df['YearMonthCapture'] = pd.Series(df['Purch.Date'].map(lambda x: str(x.replace(day=1).date()) ), dtype='str')
#df['YearMonthCapture'] = pd.Series(df['Purch.Date'].apply(lambda x: str(x.replace(day=1).date()) ), dtype='str')

# === Ugly work around that does not really address the problem) ===
df['YearMonthCapture'] = pd.Series(df['Purch.Date'].apply(lambda x: 100*x.year + x.month)

You can do this by accessing the day attribute and subtracting a TimedeltaIndex from your datetime and casting to str:

In [138]:
df = pd.DataFrame({'date':pd.date_range(dt.datetime(2016,1,1), periods=4)})
df

Out[138]:
        date
0 2016-01-01
1 2016-01-02
2 2016-01-03
3 2016-01-04

In [142]:
(df['date'] - pd.TimedeltaIndex(df['date'].dt.day - 1, unit='D')).astype(str)

Out[142]:
0    2016-01-01
1    2016-01-01
2    2016-01-01
3    2016-01-01
Name: date, dtype: object

So in your case:

df['YearMonthCapture'] = (df['Purch.Date'] - pd.TimedeltaIndex(df['Purch.Date'].dt.day - 1, unit='D')).astype(str)

should work

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM