Python/pandas For 循環與 Excel，將多個工作簿（單列）合並到搜索列表

Question

我可以在 Excel VBA 上運行它，但不能在 python 上運行...

感謝有人可以提供幫助！ 這就是我到目前為止所擁有的。

列名“搜索”是我要合並的公共索引。

import pandas as pd
import os

l = []
for root, dirs, files in os.walk(r"D:/"):
    for file in files:
        if file.endswith(".xlsx"):
             l.append(os.path.join(root, file))


search = 'Search List.xlsx'
source = pd.read_excel(open(search,'rb'), sheet_name=0)
source.set_index("Search", inplace = True)


for i in range(0, len(l)):
    path = l[i]
    df = pd.read_excel(open(path,'rb'), sheet_name=0)
    df.rename(columns={ df.columns[3]: "Search" }, inplace = True)
    df.set_index("Search",inplace = True)

final = pd.merge(source, df, on = ['Search'], how = 'left')

Os.walk 給了我一個以 xlsx 結尾的文件的路徑，並創建一個列表？

['D:/Search\\Find List 1.xlsx', 'D:/Search\\Find List 2.xlsx', 'D:/Search\\Find List 3.xlsx', 'D:/Search\\Find List 4. xlsx']

獲得路徑列表后，我需要一次打開一個，與與列搜索匹配的“源”列表合並。 一個一個，我需要合並剩余的excel文件。 這有意義嗎？

如何循環讀取 excel，合並匹配的列，然后移動到列表的下一次迭代。

我他媽的很困惑

感謝您的幫助！

Answer 1

在 Sammy 的建議下找到了解決方案。 我連接列表中的所有 Excel 文件，然后在與原始搜索列表合並之前根據需要調整數據。

import pandas as pd
import os

l = []


for root, dirs, files in os.walk(r"D:/Search"):
    for file in files:
        if file.endswith(".xlsx"):

                df = pd.read_excel(open(file,'rb'), sheet_name=0, header = 0)
                df.rename(columns={ df.columns[3]: "Search" }, inplace = True)
                df["Path"] = file
                l.append(df)

frame = pd.concat(l, axis=0, ignore_index=True)
frame = frame.drop([frame.columns[0] , frame.columns[1], frame.columns[2], frame.columns[4]],  axis='columns')
frame.set_index("Search",inplace = True)


search = 'Search List.xlsx'
source = pd.read_excel(open(search,'rb'), sheet_name=0)
source.set_index("Search", inplace = True)



final = pd.merge(source, frame, on = ['Search'], how = 'left')

Python/pandas For 循環與 Excel，將多個工作簿（單列）合並到搜索列表

問題描述

列名“搜索”是我要合並的公共索引。

1 個解決方案

解決方案1
1 2020-01-28 13:50:47

Python/pandas For 循環與 Excel，將多個工作簿（單列）合並到搜索列表

問題描述

列名“搜索”是我要合並的公共索引。

1 個解決方案

解決方案1 1 2020-01-28 13:50:47

解決方案1
1 2020-01-28 13:50:47