[英]Python Pandas copy columns from one sheet to another sheet without changing any data?
I have an excel file with two sheets. 我有两张纸的Excel文件。 I would like to copy 3 columns from the first sheet to the second sheet.
我想将3列从第一张纸复制到第二张纸。
Note: 注意:
I have tried many methods. 我尝试了很多方法。 My best attempt so far is :
到目前为止,我最好的尝试是:
df_new_sheet2 = pd.concat([df_old_sheet2, df_three_of_sheet1], axis=1, join_axes=[df_old_sheet2.index])
But this is not the desired output. 但这不是所需的输出。
If pandas can not do this, could you please suggest some other python package which could work ? 如果熊猫不能做到这一点,您能建议其他一些可以使用的python软件包吗?
In case I'm not describing the problem clearly enough, I upload a pic which may help more or less. 万一我对问题的描述不够清楚,可以上传一张或多或少有帮助的图片。 Thx for your answers~
谢谢你的回答〜
UPDATE[2017.07.24]: UPDATE [2017年7月24日]:
I Finally find my fault! 我终于找到我的错!
insert one column with index number and then , follow the resolution of b2002, things gonna to be well. 插入带有索引号的一列,然后按照b2002的解析,一切都会好起来的。 :)
:)
This method uses pandas and xlsxwriter . 此方法使用pandas和xlsxwriter 。
Setup (create demo excel file): 设置(创建演示excel文件):
import pandas as pd
df1 = pd.DataFrame({'1_A': [1,2,3,4], '1_B': [5,4,6,5],
'1_C': [8,7,9,0], '1_D': [9,7,8,5], '1_E': [2,4,9,8]})
df2 = pd.DataFrame({'1_A': [5,4,1,3], '1_B': [55,2,3,4]})
setup_dict = {'Sheet_1': df1, 'Sheet_2': df2}
with pd.ExcelWriter('excel_file.xlsx',
engine='xlsxwriter') as writer:
for ws_name, df_sheet in setup_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)
(Start here to read an existing excel file) (从这里开始阅读现有的excel文件)
#Read your excel file, use "sheetname=None" to create a dictionary of
#worksheet dataframes. (Note: future versions of pandas will use
#"sheet_name" vs. "sheetname").
#Replace 'excel_file.xlsx' with the actual path to your file.
ws_dict = pd.read_excel('excel_file.xlsx', sheetname=None)
#Modify the Sheet_2 worksheet dataframe:
#(or, create a new worksheet by assigning concatenated df to a new key,
#such as ws_dict['Sheet_3'] = ...)
ws_dict['Sheet_2'] = pd.concat([ws_dict['Sheet_2'][['1_A','1_B']],
ws_dict['Sheet_1'][['1_A','1_B','1_C']]],
axis=1)
#Write the ws_dict back to disk as an excel file:
#(replace 'excel_file.xlsx' with your desired file path.)
with pd.ExcelWriter('excel_file.xlsx',
engine='xlsxwriter') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)
Other methods may be used to combine the columns such as a join (with different suffixes representing the original worksheets, for example) since all of the worksheets are converted to dataframes when the excel file is read. 由于读取excel文件时所有工作表都将转换为数据框,因此可以使用其他方法来合并列,例如联接(例如,具有不同的后缀代表原始工作表)。
EDIT (for new worksheet and unique column names...) 编辑(用于新工作表和唯一列名...)
ws_dict = pd.read_excel('excel_file.xlsx', sheetname=None)
#Modify the Sheet_2 worksheet dataframe:
#(or, create a new worksheet by assigning concatenated df to a new key,
#such as ws_dict['Sheet_3'] = ...)
ws_dict['Sheet_3'] = ws_dict['Sheet_2'][['1_A','1_B']].join(ws_dict['Sheet_1'][['1_A','1_B','1_C']],
lsuffix='_sh2', rsuffix='_sh1', how='outer')
#Write the ws_dict back to disk as an excel file:
#(replace 'excel_file.xlsx' with your desired file path.)
with pd.ExcelWriter('excel_file.xlsx',
engine='xlsxwriter') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)
If using Excel and Python for Windows (and if not, for future readers), consider an SQL solution with an ODBC connection to the JET/ACE Engine which can query Excel workbooks, its own Access databases, even text files (csv/tab/txt). 如果在Windows上使用Excel和Python(如果不使用,则供以后的读者使用),请考虑使用SQL解决方案,该解决方案具有与JET / ACE Engine的ODBC连接,可以查询Excel工作簿,其自己的Access数据库,甚至是文本文件(csv / tab /文本)。 This engine which are .dll files is installed by default with Windows machines or MS Office.
默认情况下,Windows机器或MS Office会安装此.dll文件引擎。 This approach avoids opening any workbook.
此方法避免打开任何工作簿。
Simply run an INNER JOIN
on the sheets and use panda's read_sql()
to import query resultset directly into a dataframe. 只需在工作表上运行
INNER JOIN
,然后使用panda的read_sql()
将查询结果集直接导入到数据read_sql()
即可。 Connection can use pyodbc
or pypyodbc
modules. 连接可以使用
pyodbc
或pypyodbc
模块。 And since you work in SQL, SELECT
needed columns, rename them, filter with WHERE
, JOIN
or UNION
other worksheets and in other workbooks, even aggregate with GROUP BY
: 由于您使用的是SQL,因此
SELECT
需要一些列,对其进行重命名,使用WHERE
, JOIN
或UNION
其他工作表以及其他工作簿进行过滤,甚至可以使用GROUP BY
汇总:
import pyodbc
import pandas as pd
strfile = "C:\Path\To\Workbook.xlsx"
conn = pyodbc.connect(r'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};' + \
'DBQ={};'.format(strfile), autocommit=True)
strSQL = " SELECT s1.[1_A] As s1_1_A, s1.[1_B] As s1_1_B," + \
" s2.[1_A] AS s2_1_A, s2.[1_B] As s2_1_B, s2.[1_C] As s2_1_C" + \
" FROM [Sheet1$] s1" + \
" INNER JOIN [Sheet2$] s2 ON s1.[index] = s2.[index]"
df = pd.read_sql(strSQL, conn)
conn.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.