apply() to every column of every dataframe of an ExcelFile, Pandas

Question

I have an xlsx file with multiple worksheets.

I read it in & separate the worksheets into dataframes:

xls=pd.ExcelFile('path/to/multisheet_excelfile.xlsx')
dfs={sheet: pd.read_excel(xls,sheet) for i, sheet in enumerate(xls.sheet_names)}

I iterate thorugh the dataframes, & then iterate though the rows, to apply apply() :

for df in dfs.values():
    for col in df.columns:
        df[col] = df[col].apply(lambda name:
                                            # apply some function here, let's say:
                                            re.sub("[\[].*?[\]]", "", repr(name)))

Is there a better way to do this, possibly not involving a double for loop?

Answer 1

You can't do it without loops because pandas creates DataFrame from sheet . But you can do it in 1 loop:

# {'sheet_name1': df1, 'sheet_name2': df2, ...}
dfs = pd.read_excel(xls, sheet_name=pd.ExcelFile('file_path').sheet_names)  # type: dict
dfs = {
    sheet_name: df.applymap(lambda x: re.sub("[\[].*?[\]]", "", repr(x))
    for sheet_name, df in dfs.items()
}

apply() to every column of every dataframe of an ExcelFile, Pandas

Question

1 answers

solution1
1 ACCPTED 2020-08-26 15:03:13

apply() to every column of every dataframe of an ExcelFile, Pandas

Question

1 answers

solution1 1 ACCPTED 2020-08-26 15:03:13

solution1
1 ACCPTED 2020-08-26 15:03:13