简体   繁体   English

Python df.to_excel() 将数字存储为 excel 中的文本。 如何存储为值?

[英]Python df.to_excel() storing numbers as text in excel. How to store as Value?

I am scraping table data from google finance through pd.read_html and then saving that data to excel through df.to_excel() as seen below:我正在通过 pd.read_html 从谷歌金融中抓取表格数据,然后通过df.to_excel()将该数据保存到 excel 中,如下所示:

    dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
    xlWriter = pd.ExcelWriter(output.xlsx, engine='xlsxwriter')

    for i, df in enumerate(dfs):
        df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
    xlWriter.save()

However, the numbers that are saved to excel are stored as text with the little green triangle in the corner of the cell.但是,保存到 excel 的数字以文本形式存储,单元格的角落带有绿色小三角形。 When moving over this data to excel, how do I store them as actual values and not text?将这些数据移到 excel 时,如何将它们存储为实际值而不是文本?

In addition to the other solutions where the string data is converted to numbers when creating or using the dataframe it is also possible to do it using options to the xlsxwriter engine:除了在创建或使用数据帧时将字符串数据转换为数字的其他解决方案之外,还可以使用xlsxwriter引擎的选项来实现:

writer = pd.ExcelWriter('output.xlsx',
                        engine='xlsxwriter',
                        options={'strings_to_numbers': True})

From the docs :文档

strings_to_numbers : Enable the worksheet.write() method to convert strings to numbers, where possible, using float() in order to avoid an Excel warning about "Numbers Stored as Text". strings_to_numbers :启用worksheet.write()方法以将字符串转换为数字,在可能的情况下,使用float()以避免有关“数字存储为文本”的 Excel 警告。

Consider converting numeric columns to floats since the pd.read_html reads web data as string types (ie, objects).考虑将数字列转换为浮点数,因为pd.read_html将网络数据读取为字符串类型(即对象)。 But before converting to floats, you need to replace hyphens to NaNs:但在转换为浮点数之前,您需要将连字符替换为 NaN:

import pandas as pd
import numpy as np

dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL' +
                   '&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
workbook = xlWriter.book

for i, df in enumerate(dfs):
    for col in df.columns[1:]:                  # UPDATE ONLY NUMERIC COLS 
        df.loc[df[col] == '-', col] = np.nan    # REPLACE HYPHEN WITH NaNs
        df[col] = df[col].astype(float)         # CONVERT TO FLOAT   

    df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))

xlWriter.save()

That is probably because the Data Types of those columns where the warning is showing are objects and not Numeric Types, such as int or float .这可能是因为显示警告的那些列的数据类型是objects而不是数字类型,例如intfloat

In order to check the Data Types of each column of the DataFrame, use dtypes , such as为了检查DataFrame每一列的Data Types,使用dtypes ,比如

print(df.dtypes)

In my case, the column that was stored as object instead of a numeric value, was PRECO_ES就我而言,作为对象而不是数值存储的列是PRECO_ES

DF数据类型

As, in my particular case, the decimal numbers are relevant, I have converted it, using astype , to float, as following因为,在我的特殊情况下,十进制数是相关的,我已经使用astype将其转换为浮点数,如下所示

df['PRECO_ES'] = df['PRECO_ES'].astype(float)

If we check again the Data Types, we get the following如果我们再次检查数据类型,我们会得到以下信息

DF 列更改为浮动

Then, all you have to do is export the DataFrame to Excel然后,您所要做的就是将 DataFrame 导出到 Excel

#Export the DataFRame (df) to XLS
xlsFile = "Preco20102019.xls"
df.to_excel(xlsFile)

#Export the DataFRame (df) to CSV
csvFile = "Preco20102019.csv"
df.to_csv(csvFile)

If I then open the Excel file, I can see that the warning is not showing anymore, as the values are stored as numeric and not as text如果我然后打开 Excel 文件,我可以看到警告不再显示,因为值存储为数字而不是文本

没有警告的 Excel 文件

Did you verify that the columns that you're exporting are actually numbers in python (int or float)?您是否验证过要导出的列实际上是 python 中的数字(int 或 float)?

Alternatively, you can convert the text fields into numbers in excel using the =VALUE() function.或者,您可以使用 =VALUE() 函数将文本字段转换为 Excel 中的数字。

Since pandas 0.19, you can supply the argument na_values to pd.read_html which will allow pandas to correctly automatically infer the float type to your price columns...从熊猫 0.19 开始,您可以将参数 na_values 提供给 pd.read_html,这将允许熊猫正确自动推断您的价格列的浮点类型...

Here's how that would look like:下面是它的样子:

dfs = pd.read_html(
    'https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM',
    flavor='html5lib',
    index_col='\nIn Millions of USD (except for per share items)\n',
    na_values='-'
)

xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
for i, df in enumerate(dfs):
    df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
xlWriter.save()

Alternatively (if you don't have pandas 0.19 yet), I'd use a simpler version of @Parfait's solution:或者(如果您还没有 Pandas 0.19),我会使用更简单的 @Parfait 解决方案版本:

dfs = pd.read_html(
    'https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM',
    flavor='html5lib',
    index_col='\nIn Millions of USD (except for per share items)\n'
)

xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
for i, df in enumerate(dfs):
    df.mask(df == '-').astype(float).to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
xlWriter.save()

This second solution only works if you correctly define your index column (in the .read_html), it will fail miserably with a ValueError if one of the (data) columns contains anything that is not convertible to a float...第二种解决方案仅在您正确定义索引列(在 .read_html 中)时才有效,如果(数据)列之一包含任何不可转换为浮点数的内容,它会因 ValueError 悲惨地失败...

If you want your excel sheet to have string data type do like so:如果您希望 Excel 表具有字符串数据类型,请执行以下操作:

for col in original_columns:
    df_employees[col] = df_employees[col].astype(pd.StringDtype())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM