pd.merge “类型错误：字符串索引必须是整数”

Question

I have 3 files and my code is basically a series of merges that populates data from files "lookup" and "NonPO" into the file "supplier" and create a new df called "final2" .我有 3 个文件，我的代码基本上是一系列合并，将文件"lookup"和"NonPO"中的数据填充到文件"supplier"中，并创建一个名为"final2"的新 df。 The code runs perfectly fine and produces output I am expecting until the very last merge.代码运行得非常好并产生 output 我期待直到最后一次合并。

The issue occurs when the very last merge is done based on the new column on "supplier" (vendor number + vendor site code) called "Unique" with a column of the same name in the file "NonPO" .当最后一次合并基于"supplier" （供应商编号 + 供应商站点代码）上称为"Unique"的新列完成时，会出现该问题，该列在文件"NonPO"中具有相同名称的列。 The only thing different with this merge is that it is based on a column that was created by concatenation (previous merges used columns that were already in the files).此合并的唯一不同之处在于它基于通过串联创建的列（之前的合并使用了文件中已经存在的列）。 The concatenation joins columns that may contain letters and/or numbers, eg "260549" + "EXPENSE" = "260549EXPENSE" .串联连接可能包含字母和/或数字的列，例如"260549" + "EXPENSE" = "260549EXPENSE" 。

The error I am getting is:我得到的错误是：

    runfile('//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing/file.py', wdir='//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing')
Traceback (most recent call last):

  File "\\eu.ad.hertz.com\userdocs\irac920\Desktop\My Files\Python\Supplier cat testing\file.py", line 33, in <module>
    final2 = pd.merge(final2, NonPO[['Unique','Category']], on='Unique', how='left')

TypeError: string indices must be integers

My files:我的文件：

"supplier" - ( File link ) "supplier" -（文件链接）
"lookup" - ( File link ) "lookup" -（文件链接）
"NonPO" - ( File link ) "NonPO" - （文件链接）

Any help with resolving this will be greatly appreciated.任何解决此问题的帮助将不胜感激。 Thank you!谢谢！

My code:我的代码：

import pandas as pd
import numpy as np
pd.set_option('display.expand_frame_repr', False)


supplier = r'//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing/Suppliers.xlsx'
lookup = r'//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing/Lookup.xlsx'
NonPO = r'//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing/Non-PO Suppliers.xlsx'

sr = pd.read_excel(supplier)
lp_type = pd.read_excel(lookup, sheet_name=0)
lp_paygroup = pd.read_excel(lookup, sheet_name=1)
NonPO_Suppliers = pd.read_excel(NonPO)

results_type = pd.merge(sr, lp_type[['Type','L1']], on='Type', how='left')
results_type.sort_values(by='Supplier', inplace=True)

results_paygroup = pd.merge(results_type, lp_paygroup[['Paygroup','L2']], on='Paygroup', how='left')
results_paygroup.sort_values(by='Supplier', inplace=True)

type_from_paygroup = results_paygroup.copy()
type_from_paygroup['L1'] = results_paygroup.merge(lp_paygroup, on='Paygroup', how='left').apply(lambda r: r.L1_x if (r.L1_y is np.nan or r.L2_y == 'Vendor Level') else r.L1_y, axis=1)
type_from_paygroup.sort_values(by='Supplier', inplace=True)

paygroup_from_type = type_from_paygroup.copy()
paygroup_from_type['L2'] = type_from_paygroup.merge(lp_type, on='Type', how='left').apply(lambda r: r.L2_x if (r.L2_y is np.nan or r.L2_y == 'Vendor Level') else r.L2_y, axis=1)
paygroup_from_type.sort_values(by='Supplier', inplace=True)
final = paygroup_from_type.replace(np.nan,'Missing')


final['Unique']=final['Vendor Number'].astype(str) + final['Vendor Site Code'].astype(str)
final2 = final.copy()
final2 = pd.merge(final2, NonPO[['Unique','Category']], on='Unique', how='left')
print(final2)

Answer 1

You are trying to access NonPO as your data frame, but in fact this is the variable that contains that filename, which is a string.您正在尝试访问NonPO作为您的数据框，但实际上这是包含该文件名的变量，它是一个字符串。 Here it's clear这里很清楚

NonPO_Suppliers = pd.read_excel(NonPO)

Just change NonPO to NonPO_Suppliers and you should be fine.只需将NonPO更改为NonPO_Suppliers就可以了。

final2 = pd.merge(final2, NonPO_Suppliers[['Unique','Category']], on='Unique', how='left')

Answer 2

Consider this:考虑一下：

NonPO = r'//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing/Non-PO Suppliers.xlsx'
NonPO_Suppliers = pd.read_excel(NonPO) # this is the name of the DataFrame, not NonPO.

Consequently, you need to change your code to this:因此，您需要将代码更改为：

final2 = pd.merge(final2, NonPO[['Unique','Category']], on='Unique', how='left')
final2 = pd.merge(final2, NonPO_Suppliers[['Unique','Category']], on='Unique', how='left')

Hopefully this will work.希望这会奏效。

pd.merge “类型错误：字符串索引必须是整数”

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-12-17 19:31:27

解决方案2
1 2020-12-17 19:37:57

pd.merge “类型错误：字符串索引必须是整数”

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-12-17 19:31:27

解决方案2 1 2020-12-17 19:37:57

解决方案1
2 已采纳 2020-12-17 19:31:27

解决方案2
1 2020-12-17 19:37:57