I have created some simple code that copies and pastes all Excel files in directory in same folder with same formats and columns name....
The Excel file is a type of.xlsx as this file contains 3 sheets, so now I have three sheets called GSM, UMTS, and LTE and this sheet name is the same name in all sheets. Now all I need to copy the data in sheet GSM, data in UMTS, and data in LTE to it's every own data to the new sheet, and drop duplicates.....
As I need also the change the color of the columns or keep it as the same style as like from source, and text style, etc...
So Here's my code:
import pandas as pd
import os
basepath = r'C:\Users\mwx825326\PycharmProjects\MyExcelCombine\myCDD Combine'
files = list(filter(lambda x: '.xlsx' in x, os.listdir(basepath)))
alldf = pd.DataFrame()
for f in files:
df= pd.read_excel(f"{basepath}/{f}",encoding='latin-1', sheet_name=None)
alldf = pd.concat([alldf,df]).drop_duplicates(keep=False)
alldf.to_excel("1- CDD Total12.xlsx")
and this is my error
Traceback (most recent call last):
File "C:/Users/mwx825326/PycharmProjects/MyExcelCombine/CombineTool.py", line 9, in <module>
alldf = pd.concat([alldf,df]).drop_duplicates(keep=False)
File "C:\Users\mwx825326\PycharmProjects\MyExcelCombine\venv\lib\site-packages\pandas\core\reshape\concat.py", line 255, in concat
sort=sort,
File "C:\Users\mwx825326\PycharmProjects\MyExcelCombine\venv\lib\site-packages\pandas\core\reshape\concat.py", line 332, in __init__
raise TypeError(msg)
TypeError: cannot concatenate object of type '<class 'collections.OrderedDict'>'; only Series and DataFrame objs are valid
Process finished with exit code 1
and this is my sheets looks like
mydir = (os.getcwd()).replace('\\', '/') + '/'
gsm_cdd_total = pd.read_excel(r'' + mydir + '1- CDD Total.xlsx' ,sheet_name='GSM')
umts_cdd_total = pd.read_excel(r'' + mydir + '1- CDD Total.xlsx' ,sheet_name='UMTS')
lte_cdd_total = pd.read_excel(r'' + mydir + '1- CDD Total.xlsx' ,sheet_name='LTE')
gsm_generate = pd.read_excel(r'' + mydir + 'GUL CDD20191008021501.xlsx' ,sheet_name='GSM')
umts_generate = pd.read_excel(r'' + mydir + 'GUL CDD20191008021501.xlsx' ,sheet_name='UMTS')
lte_generate = pd.read_excel(r'' + mydir + 'GUL CDD20191008021501.xlsx' ,sheet_name='LTE')
and this my excels xlsx
looks like it have three main sheets ever sheet have it's own data xlsx looks like
So If any one knows how to update data relate to every sheet and how to solve this problem?
When you run read_excel whith sheet_name=None , the result is a dictionary ( sheet_name : DataFrame ).
So:
Something like:
for f in files:
# Here the result is a dictionary of DataFrames
dct = pd.read_excel(f"{basepath}/{f}",encoding='latin-1', sheet_name=None)
# Process each DataFrame from this dictionary
for df in dct.values()
alldf = pd.concat([alldf,df]).drop_duplicates(keep=False)
Another possibility: If each your Excel file has only a single sheet to read from, you can run your original code, but without sheet_name parameter (its default value is 0 , meaning read only from the first sheet and return a DataFrame ).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.