简体   繁体   中英

How can I pull data by uniqueid from an xlsx file and write that data to another xlsx file with the same column name using Python?

I have asked this multiple times but this time, I will use both xlsx. Until now I couldn't figure out how to do this properly so I am seeking again for your expertise. Basically, I would like to create a Python script to pull the data from an excel file and write to another excel file. For example:

Initial value of both Excel File:

XLSX1              XLSX2
Column_A Column_B  Column_A Column_B
A                  A        21 
B                  B        25  
C                  C        2
D                  D        5
E                  E        9 
F                  F        10 
G                  G        15 
H                  H        16

Once script is applied, Column_B of XLSX2 will be written in Column_B of XLSX1:

XLSX1              XLSX2
Column_A Column_B  Column_A Column_B
A        21        A        21 
B        25        B        25  
C        2         C        2
D        5         D        5
E        9         E        9 
F        10        F        10 
G        15        G        15 
H        16        H        16

Initially a user will choose which rows to choose from XLSX1 and pull data from XLSX2 per uniqueid(Column_A) and I figured this is difficult. So I would appreciate if I can copy the whole data from Column_B of XLSX2 and write it to Column_B of XLSX1 per uniqueid (A,B,C,D,E,F,G,H).

(Posting answer on behalf of the question author to move it to the answer space) .

Below is the code that works for me.

import openpyxl

daily_data = openpyxl.load_workbook('C:/XLSX1.xlsx')
master_data = openpyxl.load_workbook('C:/XLSX2.xlsx')

daily_sheet = daily_data['Sheet']
master_sheet = master_data['Sheet']

for i in daily_sheet.iter_rows():
A = i[0].value
row_number = i[0].row
for j in master_sheet.iter_rows():
    if j[0].value == A:
        daily_sheet.cell(row=row_number, column=3).value = j[1].value
        print(j[1].value)

daily_data.save('C:/XLSX1.xlsx')

The last example appended the searched ID and the lookup data to the sheet which means there were two values written, ID and Data, after the 'last used' row in XLSX1 assuming you only wanted the IDs the user enters in the sheet.
If we return to the example XLSX1 sheet where there are IDs, in this case A, B, C, D, E, F, G and H, existing in the sheet, this example will request IDs as before and this time update the Data where the ID already exists in (column A of) XLSX1
The code searches each entered ID in XLSX2 (col A), if found gets the data from column B, then searches for the ID in XLSX1 (col A) and enters the data in to column B.

In the test output I have jumbled up the IDs in XLSX1 to show the code searches in XLSX1 to add the data to where ever the ID is located.
In the example run the IDs 'd,x,a,g' were entered,

Enter the IDs to obtain: d,x,a,g

since d,a & g are in XLSX2 the data for these IDs was updated in XLSX1
See the images before and after code run.
The code updates the sheet, meaning that if the code was run again and 'b' ID was entered then XLSX1 will have data for the original run on 'd,a & g' and also 'b'.

import openpyxl
import pandas as pd

import warnings
warnings.filterwarnings('ignore') # setting ignore as a parameter

daily_data = 'C:/XLSX1.xlsx'
master_data = 'C:/XLSX2.xlsx'
worksheet = 'Sheet'

### Example sheet IDs A, B, C, D, E, F, G, H,
### Get IDs from the user (comma separated entry) and add to list ids_list
input_ids = input("Enter the IDs to obtain: ")
### Entered IDs are uppercased and stripped of white spaces.
ids_list = [item.strip().upper() for item in input_ids.split(',')]

### Load the Master sheet to Pandas for searching
df1 = pd.read_excel(master_data, sheet_name=worksheet)
### Load the Daily sheet to Pandas for searching
df2 = pd.read_excel(daily_data, sheet_name=worksheet)

### Column names for writing back to excel sheet 
column_list = df2.columns

### Open writer for pandas dataframe (df) write back to excel 
### mode a = append, overlay the existing sheet
writer = pd.ExcelWriter(daily_data,
                    mode='a', 
                    if_sheet_exists='overlay',
                    engine='openpyxl')

for uid in ids_list:
    ### Search 1st col of XLSX2 df for the ID
    search1 = df1.loc[df1.iloc[:,0] == uid]

    ### If search returns a value then add to the location 
    if search1.size > 0:
        ### Search 1st col of XLSX1 dataframe for the ID
        search2 = df2.loc[df2.iloc[:,0] == uid]
        ### Update XLSX1 df with the data value from XLSX2 df
        df2.at[search2.index[0], df2.iloc[:, 1].name] = df1.iloc[:,1].loc[search1.index[0].item()]

### Write updated dataframe to XLSX1 sheet
df2.to_excel(writer, sheet_name=worksheet, startrow=1, header=False, index=False)

### Drop pandas header formatting
book  = writer.book
sheet = writer.sheets[worksheet]
for idx, val in enumerate(column_list,1):
    sheet.cell(row=1, column=idx).value = val

### Save XLSX1 workbook
writer.save()

Images before and after lookup where 'd,x,a,g' was entered.
XLSX1 在运行代码之前,ID 是混乱的 XLSX1 在运行输入“d,x,a,g”的代码后

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM