简体   繁体   中英

apply() to every column of every dataframe of an ExcelFile, Pandas

I have an xlsx file with multiple worksheets.

I read it in & separate the worksheets into dataframes:

xls=pd.ExcelFile('path/to/multisheet_excelfile.xlsx')
dfs={sheet: pd.read_excel(xls,sheet) for i, sheet in enumerate(xls.sheet_names)}

I iterate thorugh the dataframes, & then iterate though the rows, to apply apply() :

for df in dfs.values():
    for col in df.columns:
        df[col] = df[col].apply(lambda name:
                                            # apply some function here, let's say:
                                            re.sub("[\[].*?[\]]", "", repr(name)))

Is there a better way to do this, possibly not involving a double for loop?

You can't do it without loops because pandas creates DataFrame from sheet . But you can do it in 1 loop:

# {'sheet_name1': df1, 'sheet_name2': df2, ...}
dfs = pd.read_excel(xls, sheet_name=pd.ExcelFile('file_path').sheet_names)  # type: dict
dfs = {
    sheet_name: df.applymap(lambda x: re.sub("[\[].*?[\]]", "", repr(x))
    for sheet_name, df in dfs.items()
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM