简体   繁体   中英

(1)TypeError: cannot concatenate object of type '<class 'collections.OrderedDict'>'; only Series and DataFrame objs are valid (2)

I have created some simple code that copies and pastes all Excel files in directory in same folder with same formats and columns name....

The Excel file is a type of.xlsx as this file contains 3 sheets, so now I have three sheets called GSM, UMTS, and LTE and this sheet name is the same name in all sheets. Now all I need to copy the data in sheet GSM, data in UMTS, and data in LTE to it's every own data to the new sheet, and drop duplicates.....

As I need also the change the color of the columns or keep it as the same style as like from source, and text style, etc...

So Here's my code:

import pandas as pd
import os

basepath = r'C:\Users\mwx825326\PycharmProjects\MyExcelCombine\myCDD Combine'
files = list(filter(lambda x: '.xlsx' in x, os.listdir(basepath)))
alldf = pd.DataFrame()
for f in files:
    df= pd.read_excel(f"{basepath}/{f}",encoding='latin-1', sheet_name=None)
    alldf = pd.concat([alldf,df]).drop_duplicates(keep=False)

alldf.to_excel("1- CDD Total12.xlsx")

and this is my error

Traceback (most recent call last):
  File "C:/Users/mwx825326/PycharmProjects/MyExcelCombine/CombineTool.py", line 9, in <module>
    alldf = pd.concat([alldf,df]).drop_duplicates(keep=False)
  File "C:\Users\mwx825326\PycharmProjects\MyExcelCombine\venv\lib\site-packages\pandas\core\reshape\concat.py", line 255, in concat
    sort=sort,
  File "C:\Users\mwx825326\PycharmProjects\MyExcelCombine\venv\lib\site-packages\pandas\core\reshape\concat.py", line 332, in __init__
    raise TypeError(msg)
TypeError: cannot concatenate object of type '<class 'collections.OrderedDict'>'; only Series and DataFrame objs are valid

Process finished with exit code 1

and this is my sheets looks like

mydir = (os.getcwd()).replace('\\', '/') + '/'

gsm_cdd_total = pd.read_excel(r'' + mydir + '1- CDD Total.xlsx' ,sheet_name='GSM')
umts_cdd_total = pd.read_excel(r'' + mydir + '1- CDD Total.xlsx' ,sheet_name='UMTS')
lte_cdd_total = pd.read_excel(r'' + mydir + '1- CDD Total.xlsx' ,sheet_name='LTE')

gsm_generate = pd.read_excel(r'' + mydir + 'GUL CDD20191008021501.xlsx' ,sheet_name='GSM')
umts_generate = pd.read_excel(r'' + mydir + 'GUL CDD20191008021501.xlsx' ,sheet_name='UMTS')
lte_generate = pd.read_excel(r'' + mydir + 'GUL CDD20191008021501.xlsx' ,sheet_name='LTE')

and this my excels xlsx looks like it have three main sheets ever sheet have it's own data xlsx looks like

So If any one knows how to update data relate to every sheet and how to solve this problem?

When you run read_excel whith sheet_name=None , the result is a dictionary ( sheet_name : DataFrame ).

So:

  • don't use df here as the destination variable (this is misleading),
  • add another loop iterating over keys / dataframes (using items ) or over dataframes alone (using values ),
  • within this loop, you can merge each DataFrame (read from the current sheet) with alldf .

Something like:

for f in files:
    # Here the result is a dictionary of DataFrames
    dct = pd.read_excel(f"{basepath}/{f}",encoding='latin-1', sheet_name=None)
    # Process each DataFrame from this dictionary
    for df in dct.values()
        alldf = pd.concat([alldf,df]).drop_duplicates(keep=False)

Another possibility: If each your Excel file has only a single sheet to read from, you can run your original code, but without sheet_name parameter (its default value is 0 , meaning read only from the first sheet and return a DataFrame ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM