[英]how to convert column values to str when reading multi-sheet xlsx using pd.read_excel?
I have a muti-sheet xlsx
file which I want to process selected pages and finally save them as CSV
.我有一个多页xlsx
文件,我想处理选定的页面,最后将它们保存为CSV
。
This is a snapshot of a few raws from one page:这是来自一页的一些原始数据的快照:
I use this code to load all pages and process each one-by-one:我使用此代码加载所有页面并逐个处理每个页面:
def load_raw_excel_file(file_full_name):
df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0)
sheets_name = list(df.keys())
return df, sheets_name
The output of the code (from the same page) looks like this:代码(来自同一页面)的 output 如下所示:
dfs, shs = load_raw_excel_file("myexelfile.xlsx")
dfs['myselectedsheetname']
As you can see, some values from the Contract
column have changed to date, but I don't want any changes.如您所见, Contract
列中的一些值已经更改,但我不想要任何更改。 I've tried using convertors
and dtype
in pd.read_excel
, but it didn't work:我尝试在pd.read_excel
中使用convertors
和dtype
,但它没有用:
df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0, dtype=str)
or或者
df = pd.read_excel("myexelfile.xlsx", sheet_name='selectedsheetname', header=0, converters={'Contract':str})
any idea?任何想法?
Update更新
I found a workaround but not a good solution:我找到了一种解决方法,但不是一个好的解决方案:
def convert_str_date(x):
try:
y = x.strftime("%b-%y")
return y
except:
return x
df.Contract.apply(lambda x : convert_str_date(x))
Also, see @Simon answer另外,请参阅@Simon 回答
the excel set those values to datetime format. excel 将这些值设置为日期时间格式。 maybe you can postprocess with the dataframe,也许您可以使用 dataframe 进行后处理,
nKCol = df['Contract']
oKCol = df['Contract'].copy()
# update cell to %b-%y string format; Nan if error
nKCol = pd.to_datetime(nKCol, errors='coerce').dt.strftime('%b-%y')
# update the column
df['Contract'] = nKCol
# fill Nan with original column
df['Contract'] = df['Contract'].fillna(oKCol)
Try changing dtype='str'
to dtype={'Contract': str}
to force Contract
as str
(no quotes around str
):尝试将dtype='str'
更改为dtype={'Contract': str}
以强制Contract
as str
( str
周围没有引号):
df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0,
dtype={'Contract': str})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.