简体   繁体   中英

Merging two DataFrame columns

I have 3 excel files with different columns that I want to merge I used this code in order to try to even merge only 2 of them

one = pd.read_excel("output3.xlsx")
two = pd.read_excel("output2.xlsx")
one = one.join(two)

But this won't merge the columns for me, and gives me an error:

ValueError: columns overlap but no suffix specified: Index(['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3'], dtype='object')

Searching online, I found the concat function and the left join and right join functions, but i have no common columns.

I just want to take the 3 Excels and connect them into 1 and each goes into its regular spot, using Pandas.

My Excel sheets look like this

表 1 表2 表 3

The problem is that all your columns are unnamed, you could do as follows:

one = pd.read_excel("output3.xlsx")
two = pd.read_excel("output2.xlsx")

# give nice, different names to your columns
one.columns = ['col_1', 'col_2', 'col_3', 'col_4']
two.columns = ['col_5', 'col_6', 'col_7', 'col_8']

one = one.join(two)

This might actually be an application for concatenation, rather than join:

df1 = pd.DataFrame([[1,2,3]]*5)
df2 = pd.DataFrame([[None,None,None,4,5,6]]*5, index = range(5))
df3 = pd.concat([df1,df2],axis=1).dropna(axis=1)

concat is easier if you know that the tables have the same number of rows. axis = 1 means a vertical concatenation. The dropna method will drop the columns that have no values.

The question is tagged with pandas and mentions that you have tried DataFrames, but given that you are trying to combine these spreadsheets by filling in ranges of rows and columns, I would suggest that you use openpyxl (if you are using 2010+ .xlsx files) or xlrd/xlwt (if you are using older .xls files).

This script assumes you know the number of rows/columns in each workbook, and that each block of cells will end up in the exact same spot in the final Excel spreadsheet. (These can also be programmatically determined with a little more work, but keep it simple to start with.) Set the start/stop values for each workbook's rows and columns, for example:

# Set workbook 1 column and row start/stop values
# indexed by 1
wb1_col = [5, 8]
wb1_row = [2, 13]

# Do same for sheet 2 and sheet 3
wb2_col = [1, 4]
wb2_row = [2, 13]

wb3_col = [1, 8]
wb3_row = [1, 2]

Now you can extract the cells in those ranges and insert them into a new spreadsheet:

from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
from openpyxl import Workbook

# Open existing spreadsheet/worksheet (modify name of worksheet to match yours)
wb1 = load_workbook(filename='output1.xlsx')['Sheet 1']
wb2 = load_workbook(filename='output2.xlsx')['Sheet 1']
wb3 = load_workbook(filename='output3.xlsx')['Sheet 1']

# Open a new spreadsheet/worksheet
wb = Workbook()
ws = wb.active

# Put data from workbook 1 into the new workbook
for column in range(wb1_col[0], wb1_col[1]):
    column_letter = get_column_letter(column)
    for row in range(wb1_row[0], wb1_row[1]):
        coordinates = column_letter + str(row)
        ws[coordinates] = wb1[coordinates]

# Put data from workbook 2 into the new workbook
for column in range(wb2_col[0], wb2_col[1]):
    column_letter = get_column_letter(column)
    for row in range(wb2_row[0], wb2_row[1]):
        coordinates = column_letter + str(row)
        ws[coordinates] = wb2[coordinates]

# Put data from workbook 3 into the new workbook
for column in range(wb3_col[0], wb3_col[1]):
    column_letter = get_column_letter(column)
    for row in range(wb3_row[0], wb3_row[1]):
        coordinates = column_letter + str(row)
        ws[coordinates] = wb3[coordinates]

# Write the results to a file
wb.save("new.xlsx")

Now the new worksheet is called new.xlsx and includes the contents of all 3 worksheets in their corresponding cell positions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM