使用pd.read_excel读取多页xlsx时如何将列值转换为str？

Question

I have a muti-sheet xlsx file which I want to process selected pages and finally save them as CSV .我有一个多页xlsx文件，我想处理选定的页面，最后将它们保存为CSV 。

This is a snapshot of a few raws from one page:这是来自一页的一些原始数据的快照：

I use this code to load all pages and process each one-by-one:我使用此代码加载所有页面并逐个处理每个页面：

def load_raw_excel_file(file_full_name):

    df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0)
    sheets_name = list(df.keys())

    return df, sheets_name

The output of the code (from the same page) looks like this:代码（来自同一页面）的 output 如下所示：

dfs, shs =  load_raw_excel_file("myexelfile.xlsx")
dfs['myselectedsheetname']

As you can see, some values from the Contract column have changed to date, but I don't want any changes.如您所见， Contract列中的一些值已经更改，但我不想要任何更改。 I've tried using convertors and dtype in pd.read_excel , but it didn't work:我尝试在pd.read_excel中使用convertors和dtype ，但它没有用：

df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0, dtype=str)

or或者

df = pd.read_excel("myexelfile.xlsx", sheet_name='selectedsheetname', header=0, converters={'Contract':str})

any idea?任何想法？

Update更新

I found a workaround but not a good solution:我找到了一种解决方法，但不是一个好的解决方案：

def convert_str_date(x):
    
    try:
        y = x.strftime("%b-%y")
        return y
    except:
        return x


df.Contract.apply(lambda x : convert_str_date(x))

Also, see @Simon answer另外，请参阅@Simon 回答

Answer 1

the excel set those values to datetime format. excel 将这些值设置为日期时间格式。 maybe you can postprocess with the dataframe,也许您可以使用 dataframe 进行后处理，

nKCol = df['Contract']                                            
oKCol = df['Contract'].copy()

# update cell to %b-%y string format; Nan if error                            
nKCol = pd.to_datetime(nKCol, errors='coerce').dt.strftime('%b-%y')

# update the column
df['Contract'] = nKCol   
  
# fill Nan with original column                                       
df['Contract'] = df['Contract'].fillna(oKCol)

Answer 2

Try changing dtype='str' to dtype={'Contract': str} to force Contract as str (no quotes around str ):尝试将dtype='str'更改为dtype={'Contract': str}以强制Contract as str （ str周围没有引号）：

df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0,
    dtype={'Contract': str})

使用pd.read_excel读取多页xlsx时如何将列值转换为str？

问题描述

2 个解决方案

解决方案1
1 2021-03-25 03:06:40

解决方案2
0 2021-03-25 01:11:08

使用pd.read_excel读取多页xlsx时如何将列值转换为str？

问题描述

2 个解决方案

解决方案1 1 2021-03-25 03:06:40

解决方案2 0 2021-03-25 01:11:08

解决方案1
1 2021-03-25 03:06:40

解决方案2
0 2021-03-25 01:11:08