简体   繁体   English

使用pd.read_excel读取多页xlsx时如何将列值转换为str?

[英]how to convert column values to str when reading multi-sheet xlsx using pd.read_excel?

I have a muti-sheet xlsx file which I want to process selected pages and finally save them as CSV .我有一个多页xlsx文件,我想处理选定的页面,最后将它们保存为CSV

This is a snapshot of a few raws from one page:这是来自一页的一些原始数据的快照:

在此处输入图像描述

I use this code to load all pages and process each one-by-one:我使用此代码加载所有页面并逐个处理每个页面:

def load_raw_excel_file(file_full_name):

    df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0)
    sheets_name = list(df.keys())

    return df, sheets_name

The output of the code (from the same page) looks like this:代码(来自同一页面)的 output 如下所示:

dfs, shs =  load_raw_excel_file("myexelfile.xlsx")
dfs['myselectedsheetname']

在此处输入图像描述

As you can see, some values from the Contract column have changed to date, but I don't want any changes.如您所见, Contract列中的一些值已经更改,但我不想要任何更改。 I've tried using convertors and dtype in pd.read_excel , but it didn't work:我尝试在pd.read_excel中使用convertorsdtype ,但它没有用:

df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0, dtype=str)

or或者

df = pd.read_excel("myexelfile.xlsx", sheet_name='selectedsheetname', header=0, converters={'Contract':str})

any idea?任何想法?

Update更新

I found a workaround but not a good solution:我找到了一种解决方法,但不是一个好的解决方案:

def convert_str_date(x):
    
    try:
        y = x.strftime("%b-%y")
        return y
    except:
        return x


df.Contract.apply(lambda x : convert_str_date(x))

Also, see @Simon answer另外,请参阅@Simon 回答

the excel set those values to datetime format. excel 将这些值设置为日期时间格式。 maybe you can postprocess with the dataframe,也许您可以使用 dataframe 进行后处理,

nKCol = df['Contract']                                            
oKCol = df['Contract'].copy()

# update cell to %b-%y string format; Nan if error                            
nKCol = pd.to_datetime(nKCol, errors='coerce').dt.strftime('%b-%y')

# update the column
df['Contract'] = nKCol   
  
# fill Nan with original column                                       
df['Contract'] = df['Contract'].fillna(oKCol) 

Try changing dtype='str' to dtype={'Contract': str} to force Contract as str (no quotes around str ):尝试将dtype='str'更改为dtype={'Contract': str}以强制Contract as strstr周围没有引号):

df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0,
    dtype={'Contract': str})

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将.xlsb 转换为.xlsx - 多页 Microsoft Excel 文件 - Convert .xlsb to .xlsx - Multi-sheet Microsoft Excel File pd.read_excel 使用 openpyxl 读取不需要的空单元格 - pd.read_excel reading not required empty cells using openpyxl 何时传递诸如“ none”之类的值以在python pd.read_excel()中起作用 - when to pass values such as 'none' to function in python pd.read_excel() 如何“字面”地打开一个excel表格? 不寻找pd.read_excel,而是在excel中打开并显示工作表的东西 - How to “literally” open up an excel sheet? Not looking for pd.read_excel but something that opens and displays the sheet in excel Python中的pd.read_excel - pd.read_excel in Python 如何从 /Filestore/tables/ 目录中的 databricks 中使用 pandad pd.read_excel 读取 excel 文件? - how to read an excel file using pandad pd.read_excel in databricks from /Filestore/tables/ directory? Pandas pd.read_excel() 舍入整数值 - Pandas pd.read_excel() rounding down integer values 一旦使用pd.read_excel将包含简单数据表的Excel工作表加载到Python中,我就无法操作数据 - Once I load an excel sheet containing a simple table of data into Python using pd.read_excel, I simply cannot manipulate the data 使用 pd.read_excel() 时,有没有办法解决 python 上的 ssl 错误(DH 密钥太小)? - Is there a way to solve an ssl error (DH key too small) on python when using pd.read_excel()? 使用 Pandas to pd.read_excel() 为同一工作簿的多个工作表 - Using Pandas to pd.read_excel() for multiple worksheets of the same workbook
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM