[英]How do I replace the values in a dataframe based on a lookup table in another dataframe
[英]PANDAS - How do I replace the values in a dataframe based on a lookup table in another dataframe
我有两个 csv 文件:
live_file.csv
Supplier SKU, Manufacturer SKU, Price
ABCD, 900000, 10
EFGH, 800000, 10
old_file.csv
Supplier SKU, Manufacturer SKU, Price
ABCD, 91234, 10
EFGHX, 85332, 10
我想在供应商 SKU 列中找到相同的值,当我找到匹配的值时,我想从 old_file.csv 中获取制造商 SKU 值并将其放入 live_file.csv 中,所以我的结果将是:
Supplier SKU, Manufacturer SKU, Price
ABCD, 91234, 10
EFGH, 800000, 10
这是我尝试过的:
import pandas as pd
live_file = pd.read_csv("live.csv")
old_file = pd.read_csv("old.csv")
old_file = old_file.set_index('Supplier SKU')['Manufacturer SKU'].dropna()
live_file['Manufacturer SKU'] = live_file['Supplier SKU'].replace(old_file)
live_file.to_csv(r'final.csv')
但这不起作用,结束文件与开始时的实时文件相同,有什么帮助吗?
使用set_index
在“供应商 SKU”上设置索引,然后在新的 livefile 上调用update
。
import pandas as pd
df1 = pd.DataFrame({
'Supplier SKU': ['ABCD','EFGH'],
'Manufacturer SKU': [900000, 800000],
'Price': [10, 10]
}).set_index('Supplier SKU')
df2 = pd.DataFrame({
'Supplier SKU': ['ABCD','EFGHX'],
'Manufacturer SKU': [91234, 85332],
'Price': [10, 10]
}).set_index('Supplier SKU')
df1.update(df2)
print(df1)
结果:
为了防止“价格”也被更新,您可以在 df2 中删除价格:
df1.update(df2.drop(columns='Price'))
PS:调用df1.reset_index()
将 'Supplier SKU' 变成普通列
您基本上可以对列Supplier SKU
上的两个文件进行左合并连接,然后在合并匹配时保留列Manufacturer SKU
的值来自old_file
,否则保留来自live_file
的值
live_file["Manufacturer SKU"] = pd.merge(live_file[["Supplier SKU", "Manufacturer SKU"]],
old_file[["Supplier SKU", "Manufacturer SKU"]],
how="left",
on="Supplier SKU",
suffixes=(None, "__right"),
indicator="merge_flag")\
.apply(lambda row: row["Manufacturer SKU"]
if row["merge_flag"] == "left_only"
else row["Manufacturer SKU__right"], axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.