简体   繁体   English

python Pandas 将 int 转换为 float 的问题

[英]problems with python Pandas converting int to float

I'm using pandas read_csv to extract data and reformat it.我正在使用 pandas read_csv 来提取数据并重新格式化。 For example, "10/28/2018" from the column "HBE date" will be reformatted to read "eHome 10/2018"例如,“HBE 日期”列中的“10/28/2018”将被重新格式化为“eHome 10/2018”

It mostly works except I am getting reformatted values like "ehome 1.0/2015.0"除了我得到像“ehome 1.0/2015.0”这样的重新格式化的值外,它大部分都有效

eHomeHBEdata['HBE date'] = pd.to_datetime(eHomeHBEdata['Course Completed'])

#extract month and year values
eMonths=[]
eYears =[]
eHomeDates = eHomeHBEdata['HBE date']

for eDate in eHomeDates:
        eMonth = eDate.month
        eYear = eDate.year
        eMonths.append(eMonth)
        eYears.append(eYear)

At this point, if I print(type(eMonth)) it returns as 'int.'此时,如果我 print(type(eMonth)) 它返回为“int”。 And if I print the eYears list, I get values like 2013, 2014, 2015 etc.如果我打印 eYears 列表,我会得到 2013、2014、2015 等值。

But then I assign the lists to columns in the data frame .但随后我将列表分配给数据框中的列。 . . . .

eHomeHBEdata.insert(0,'workshop Month',eMonths)
eHomeHBEdata.insert(1,'workshop Year',eYears)

. . . . . . after which print(ehomeHomeHBEdata['workshop Month']) returns values like 2013.0, 2014.0, 2015.0.之后 print(ehomeHomeHBEdata['workshop Month']) 返回值,如 2013.0、2014.0、2015.0。 That's type float, right?那是浮动类型,对吧?

When I try to use the following code I get the misformatted errors mentioned above当我尝试使用以下代码时,出现上述格式错误的错误

eHomeHBEdata['course session'] = "ehome " + eHomeHBEdata['workshop Month'].astype(str) + "/" + eHomeHBEdata['workshop Year'].astype(str)
eHomeHBEdata['start'] = eHomeHBEdata['workshop Month'].astype(str) + "/1/" + eHomeHBEdata['workshop Year'].astype(str) + " 12:00 PM"

Could someone explain what's going on here and help me fix it?有人可以解释这里发生了什么并帮助我解决它吗?

Solution解决方案

To convert ( reformat ) your date columns as MM/YYYY , all you need to do is:要将日期列转换(重新格式化)为MM/YYYY ,您需要做的就是:

df["Your_Column_Name"].dt.strftime('%m/%Y')

See Section-A and Section-B for two different use-cases.有关两个不同的用例,请参阅Section-ASection-B

A. Example A. 示例

I have created some dummy data for this illustration with a column called: Date .我为此插图创建了一些虚拟数据,其中有一列名为: Date To reformat this column as MM/YYYY I am using df.Dates.dt.strftime('%m/%Y') which is equivalent to df["Dates"].dt.strftime('%m/%Y') .将此列重新格式化为MM/YYYY我使用df.Dates.dt.strftime('%m/%Y')相当于df["Dates"].dt.strftime('%m/%Y') .

import pandas as pd

## Dummy Data
dates = pd.date_range(start='2020/07/01', end='2020/07/07', freq='D')
df = pd.DataFrame(dates, columns=['Dates'])

# Solution
df['Reformatted_Dates'] = df.Dates.dt.strftime('%m/%Y')
print(df)
## Output:
#        Dates Reformatted_Dates
# 0 2020-07-01           07/2020
# 1 2020-07-02           07/2020
# 2 2020-07-03           07/2020
# 3 2020-07-04           07/2020
# 4 2020-07-05           07/2020
# 5 2020-07-06           07/2020
# 6 2020-07-07           07/2020

B. If your input data is in the following format B.如果你输入的数据是以下格式

In this case, first you could convert the date using .astype('datetime64[ns, US/Eastern]') on the column.在这种情况下,首先您可以在列上使用.astype('datetime64[ns, US/Eastern]')转换日期。 This lets you apply pandas datetime specific methods on the column.这使您可以在列上应用 Pandas 日期时间特定方法。 Try running df.Dates.astype('datetime64[ns, US/Eastern]').dt.to_period(freq='M') now.现在尝试运行df.Dates.astype('datetime64[ns, US/Eastern]').dt.to_period(freq='M')

## Dummy Data
dates = [
    '10/2018', 
    '11/2018', 
    '8/2019', 
    '5/2020',
]

df = pd.DataFrame(dates, columns=['Dates'])
print(df.Dates.dtype)
print(df)

## To convert the column to datetime and reformat
df['Dates'] = df.Dates.astype('datetime64[ns, US/Eastern]') #.dt.strftime('%m/%Y')
print(df.Dates.dtype)

C. Avoid using the for loop C. 避免使用for loop

Try this.尝试这个。 You can use the inbuilt vectorization of pandas on a column, instead for looping over each row.您可以在列上使用 Pandas 的内置矢量化,而不是在每一行上循环。 I have used .dt.month and .dt.year on the column to get the month and year as int .我在列上使用.dt.month.dt.year将月份和年份作为int

eHomeHBEdata['HBE date'] = pd.to_datetime(eHomeHBEdata['Course Completed'])
eHomeDates = eHomeHBEdata['HBE date'] # this should be in datetime.datetime format

## This is what I changed
>>> eMonths = eHomeDates.dt.month
>>> eYears = eHomeDates.dt.year

eHomeHBEdata.insert(0,'workshop Month',eMonths)
eHomeHBEdata.insert(1,'workshop Year',eYears)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM