Python Pandas在不更改任何數據的情況下將列從一張紙復制到另一張紙？

Question

我有兩張紙的Excel文件。 我想將3列從第一張紙復制到第二張紙。

注意：

復制的第3列的標簽名稱與第二張紙有些重復。 但是我應該保留第二張紙的原始數據而不更改它們 。

我嘗試了很多方法。 到目前為止，我最好的嘗試是：

 df_new_sheet2 = pd.concat([df_old_sheet2, df_three_of_sheet1], axis=1, join_axes=[df_old_sheet2.index])

但這不是所需的輸出。

如果熊貓不能做到這一點，您能建議其他一些可以使用的python軟件包嗎？

萬一我對問題的描述不夠清楚，可以上傳一張或多或少有幫助的圖片。 謝謝你的回答〜

JPG

UPDATE [2017年7月24日]：

我終於找到我的錯！

插入帶有索引號的一列，然后按照b2002的解析，一切都會好起來的。 :)

Answer 1

此方法使用pandas和xlsxwriter 。

設置（創建演示excel文件）：

import pandas as pd

df1 = pd.DataFrame({'1_A': [1,2,3,4], '1_B': [5,4,6,5],
                    '1_C': [8,7,9,0], '1_D': [9,7,8,5], '1_E': [2,4,9,8]})
df2 = pd.DataFrame({'1_A': [5,4,1,3], '1_B': [55,2,3,4]})

setup_dict = {'Sheet_1': df1, 'Sheet_2': df2}

with pd.ExcelWriter('excel_file.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in setup_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

（從這里開始閱讀現有的excel文件）

#Read your excel file, use "sheetname=None" to create a dictionary of
#worksheet dataframes.  (Note: future versions of pandas will use
#"sheet_name" vs. "sheetname").
#Replace 'excel_file.xlsx' with the actual path to your file.
ws_dict = pd.read_excel('excel_file.xlsx', sheetname=None)
#Modify the Sheet_2 worksheet dataframe:
#(or, create a new worksheet by assigning concatenated df to a new key,
#such as ws_dict['Sheet_3'] = ...)
ws_dict['Sheet_2'] = pd.concat([ws_dict['Sheet_2'][['1_A','1_B']], 
                                ws_dict['Sheet_1'][['1_A','1_B','1_C']]],
                                axis=1)
#Write the ws_dict back to disk as an excel file:
#(replace 'excel_file.xlsx' with your desired file path.)
with pd.ExcelWriter('excel_file.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

由於讀取excel文件時所有工作表都將轉換為數據框，因此可以使用其他方法來合並列，例如聯接（例如，具有不同的后綴代表原始工作表）。

編輯（用於新工作表和唯一列名...）

ws_dict = pd.read_excel('excel_file.xlsx', sheetname=None)
#Modify the Sheet_2 worksheet dataframe:
#(or, create a new worksheet by assigning concatenated df to a new key,
#such as ws_dict['Sheet_3'] = ...)
ws_dict['Sheet_3'] = ws_dict['Sheet_2'][['1_A','1_B']].join(ws_dict['Sheet_1'][['1_A','1_B','1_C']],
                                                            lsuffix='_sh2', rsuffix='_sh1', how='outer')
#Write the ws_dict back to disk as an excel file:
#(replace 'excel_file.xlsx' with your desired file path.)
with pd.ExcelWriter('excel_file.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

Answer 2

如果在Windows上使用Excel和Python（如果不使用，則供以后的讀者使用），請考慮使用SQL解決方案，該解決方案具有與JET / ACE Engine的ODBC連接，可以查詢Excel工作簿，其自己的Access數據庫，甚至是文本文件（csv / tab /文本）。 默認情況下，Windows機器或MS Office會安裝此.dll文件引擎。 此方法避免打開任何工作簿。

只需在工作表上運行INNER JOIN ，然后使用panda的read_sql()將查詢結果集直接導入到數據read_sql()即可。 連接可以使用pyodbc或pypyodbc模塊。 由於您使用的是SQL，因此SELECT需要一些列，對其進行重命名，使用WHERE ， JOIN或UNION其他工作表以及其他工作簿進行過濾，甚至可以使用GROUP BY匯總：

import pyodbc
import pandas as pd

strfile = "C:\Path\To\Workbook.xlsx"

conn = pyodbc.connect(r'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};' + \
                               'DBQ={};'.format(strfile), autocommit=True)    

strSQL = " SELECT s1.[1_A] As s1_1_A, s1.[1_B] As s1_1_B," + \
         "        s2.[1_A] AS s2_1_A, s2.[1_B] As s2_1_B, s2.[1_C] As s2_1_C" + \
         " FROM [Sheet1$] s1" + \
         " INNER JOIN [Sheet2$] s2 ON s1.[index] = s2.[index]" 

df = pd.read_sql(strSQL, conn)

conn.close()

Python Pandas在不更改任何數據的情況下將列從一張紙復制到另一張紙？

問題描述

2 個解決方案

解決方案1
2 已采納 2017-07-22 14:07:34

解決方案2
1 2017-07-22 16:14:31

Python Pandas在不更改任何數據的情況下將列從一張紙復制到另一張紙？

問題描述

2 個解決方案

解決方案1 2 已采納 2017-07-22 14:07:34

解決方案2 1 2017-07-22 16:14:31

解決方案1
2 已采納 2017-07-22 14:07:34

解決方案2
1 2017-07-22 16:14:31