使用 python 如何從一個 excel 工作表列中獲取輸入搜索字符串並在特定列中搜索其他 excel？

Question

例如，從文件 1 的 col A 中獲取輸入字符串 _TXT，然后在文件 2 的 col X 中搜索。如果任何行包含 _TXT，則對於該特定行，將 col B 值與文件 1 和文件 2 中的 col Y 值進行比較。

如果 B 列和 Y 列值匹配，則不采取任何操作。 如果它們不匹配，則使用與文件 1 中的 col B 相同的值更新文件 2 中的 col Y

excel圖片點擊這里

Answer 1

一個簡單的（雖然可能不是最有效的）方法是遵循這個算法：

將兩個 excel 文件加載到 python memory；
對於輸入文件中的每一行，從colA中取出inputString並在output dataFrame中查找，根據需要進行操作；
將第二個文件寫回磁盤。

Pandas 提供 function 來讀取 excel 文件 ( read_excel )。 鏈接中記錄了很多選項：我認為對您有用的選項是：

sheet_name：默認為 0（第一張紙），您也可以使用其全名（作為字符串）或任何數字（N 將是第 N+1 張紙）。 無表示“all_sheets”。
usecols：默認為無（所有列），如果你只需要 colA 和 colB 你可能想在這里指定（使用類似“A：B”的東西）;

function 將您的 pandas dataframe 寫回 excel 是 class DataFrame（完整文檔在這里）調用 to_excel 的方法。 有用的選項包括：

excel_writer：可以是文件路徑，也可以是ExcelWriter object，也就是pandas class 寫入dataframes到excel。使用ExcelWriter可以讓你更精確地控制你寫的東西，如果你必須寫幾張表，或者改變您的文件就位；
sheet_name：默認為“Sheet1”，似乎不接受工作表編號的整數；
index：默認為 True，寫入行索引以及其他數據 - 我們不希望這樣，因此有必要將其設置為 false。

您的最終代碼可能如下所示：

from pandas import *

# read both excel files ; assume only one sheet in file to modify
# taking only useful columns in reference file according to your example
dfIn = read_excel("path/to/refFile.xlsx", usecols="A:B")
dfOut = read_excel("path/to/outFile.xlsx", usecols=None, sheet_name="data_sheet")

for index, rowIn in dfIn.iterrows():
    inputString = rowIn['colA']
    for index, rowOut in dfOut.iterrows():
        # using python string endswith as matching rule
        # replace with anything that suits your needs
        if rowOut['colX'].endswith(inputString):
            rowOut['colY'] = rowIn['colB']
# write dfOut to disk
with ExcelWriter("path/to/outFile.xlsx", mode="a", if_sheet_exists="replace") as writer:
    dfOut.to_excel(writer, sheet_name="data_sheet", index=False)

誠然，pandas 文檔警告不要修改您正在迭代的內容（因為 iterrows 可能返回數據的副本而不是視圖，然后更改副本將無效）。 由於您在此處使用字符串，因此修改將起作用。

根據您的 excel 引擎及其版本（它適用於 python 3.8.10 和 openpyxl 3.0.9，但對於 OP 失敗）替換工作表可能會失敗。 如果是這種情況，這個相關問題建議完全刪除舊表並制作新表，如下所示：

with ExcelWriter('/path/to/file.xlsx',engine = "openpyxl",  mode='a') as writer:
 workBook = writer.book
 # data_sheet exists for sure, since we read data from it at beginning of script
 workBook.remove(['data_sheet'])
 df.to_excel(writer, sheet_name='data_sheet', index=False)
 writer.save()

使用 python 如何從一個 excel 工作表列中獲取輸入搜索字符串並在特定列中搜索其他 excel？

問題描述

1 個解決方案

解決方案1
0 2022-05-06 08:18:31

使用 python 如何從一個 excel 工作表列中獲取輸入搜索字符串並在特定列中搜索其他 excel？

問題描述

1 個解決方案

解決方案1 0 2022-05-06 08:18:31

解決方案1
0 2022-05-06 08:18:31