简体   繁体   English

使用 Python 读取 .txt 或 .csv 格式的 .xlsx

[英]Read .xlsx in .txt or .csv format with Python

Is there a way to read a .xlsx file in .txt or .csv format with Python?有没有办法用 Python 读取.txt.csv格式的.xlsx文件? Looking for a way to read an .xlsx file while preserving number formatting (eg, $45.890924).寻找一种在保留数字格式的同时读取.xlsx文件的方法(例如,45.890924 美元)。 Searched around and could not find a viable module, and creating a style converter would be next to impossible with my Python skill level.四处搜索,找不到一个可行的模块,并且以我的 Python 技能水平创建样式转换器几乎是不可能的。

A few helpful notes, Pandas would not be an option because it automatically wipes the number formatting, and I cannot classify the column's format in advance since one column can contain 20+ different number formats.一些有用的注释,Pandas 不是一个选项,因为它会自动擦除数字格式,而且我无法提前对列的格式进行分类,因为一列可以包含 20 多种不同的数字格式。

openpyxl stores the content of the cell in value and the formatting in number_format (and in a few other properties for alignment, color, font, border, etc). openpyxl将单元格的内容存储在value中,并将格式存储在number_format中(以及对齐、颜色、字体、边框等的其他一些属性)。 So you could interpret the Excel format code and translate it to Python format - but因此,您可以解释 Excel 格式代码并将其转换为 Python 格式 - 但是

  1. of course a few format properties do not make sense in CSV: eg you cannot make negative numbers red in a CSV当然,一些格式属性在 CSV 中没有意义:例如,您不能在 CSV 中将负数设为红色
  2. While Excel format codes for dates and times are relatively easy to handle, those for numbers can be very tricky to decipher.虽然日期和时间的 Excel 格式代码相对容易处理,但数字的格式代码可能非常难以破译。 As an example this is the standard currency format string for Euro:例如,这是欧元的标准货币格式字符串:
'_-* #,##0.00\ [$€-410]_-;\-* #,##0.00\ [$€-410]_-;_-* "-"??\ [$€-410]_-;_-@_-'

All that said, making a translator is not impossible.综上所述,做翻译并非不可能。 Below is a simple function to translate Excel date format strings to Python's strftime() directives.下面是将 Excel 日期格式字符串转换为 Python 的strftime()指令的简单函数。

def date_xl2py(dt, xlcode):
    xl2py = {
        'yy' : '%y',
        'yyyy' : '%Y',
        'm' : '%m', ##always zero-padded
        'mm' : '%m',
        'mmm' : '%b',
        'mmmm' : '%B',
        'mmmmm' : '%b', ##no single letter form
        'd' : '%d', ##always zero-padded
        'dd' : '%d',
        'ddd' : '%a',
        'dddd' : '%A',
        '%' : '%%' ##escape the % char
        }
    pycode = []
    for xlpart in findall(r'[d|m|y|h|s]+|.|(".+")', xlcode):
        if xlpart in xl2py:
            pycode.append(xl2py[xlpart])
        else:
            pycode.append(xlpart)
    return ''.join(pycode)

dt = datetime(2022,7,12,15,56)
dt.strftime(date_xl2py(dt, 'ddd, mmmm dd, yyyy'))          
'Tue, July 12, 2022'

Please note, I didn't take into account the specification of a locale.请注意,我没有考虑语言环境的规范。

Also, Excel offers three (rather useless) date formatting options that are not available in Python (see comments in the code): I just mapped them to the most similar option available.此外,Excel 提供了 Python 中不可用的三个(相当无用的)日期格式选项(请参阅代码中的注释):我只是将它们映射到最相似的可用选项。

And finally, if you were to add time formats, you would need to handle the fact that "mm" may be months or minutes in Excel, and select the right option based on context.最后,如果要添加时间格式,则需要处理“mm”在 Excel 中可能是月或分钟的事实,并根据上下文选择正确的选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM