[英]Subtract fixed row value in reference to column value in pandas dataframe
[英]How do you lookup a particular pandas dataframe column value in a reference table and copy a reference table value to the dataframe?
我有一個從 .csv 導入數據框(df2)的參考表。 它是 3 列和大約 400 行。 我有另一個具有許多列和行的數據框(df)。 我正在尋找從參考表中查找一個值並將其添加到 df 中的相應列。
參考表的數據格式:
MANUF PRODTYPE PRODCODE
ALPHA 1 ALPHA1
ALPHA 2 ALPHA2
BETA 1 BETA1
BETA 2 BETA2
DELTA 1 DELTA1
DELTA 2 DELTA2
數據框(df)的設置如下:
MANUF PRODTYPE SERIALNO PRODCODE INVENTORY
ALPHA 1 00001 5
ALPHA 2 00001 3
BETA 1 00001 4
DELTA 1 00001 8
ALPHA 2 00002 3
BETA 1 00002 4
DELTA 2 00001 9
DELTA 2 00002 9
DELTA 1 00002 8
BETA 2 00001 12
ALPHA 2 00003 3
我正在嘗試使用基於參考表中的 MANUF 和 PRODTYPE 的適當值填充 df 中的 PRODCODE。
我試過了:
df3 = df.merge(df2, how='left')
和
df3 = df2.merge(df, how='left')
但兩者都給了我不准確或不完整的合並。
我希望這適用於您的情況:
import sys
import pandas as pd
from io import StringIO
from datetime import datetime
data1 = StringIO("""MANUF;PRODTYPE;PRODCODE
ALPHA;1;ACME1
ALPHA;2;ACME2
BETA;1;BETA1
BETA;2;BETA2
DELTA;1;DELTA1
DELTA;2;DELTA2
""")
df1 = pd.read_csv(data1, sep=";")
print(df1)
data2 = StringIO("""MANUF;PRODTYPE;SERIALNO;PRODCODE;INVENTORY
ALPHA;1;00001;5
ALPHA;2;00001;3
BETA;1;00001;4
DELTA;1;00001;8
ALPHA;2;00002;3
BETA;1;00002;4
DELTA;2;00001;9
DELTA;2;00002;9
DELTA;1;00002;8
BETA;2;00001;12
ALPHA;2;00003;3
""")
df2 = pd.read_csv(data2, sep=";")
print(df2)
df3 = df2.merge(df1, on=['MANUF', 'PRODTYPE'], how='left')
print(df3)
結果:
MANUF PRODTYPE SERIALNO PRODCODE_x INVENTORY PRODCODE_y
0 ALPHA 1 1 5 NaN ACME1
1 ALPHA 2 1 3 NaN ACME2
2 BETA 1 1 4 NaN BETA1
3 DELTA 1 1 8 NaN DELTA1
4 ALPHA 2 2 3 NaN ACME2
5 BETA 1 2 4 NaN BETA1
6 DELTA 2 1 9 NaN DELTA2
7 DELTA 2 2 9 NaN DELTA2
8 DELTA 1 2 8 NaN DELTA1
9 BETA 2 1 12 NaN BETA2
10 ALPHA 2 3 3 NaN ACME2
沒有merge
的另一種方法是:
df2 = df2.set_index(['MANUF', 'PRODTYPE'])
output = df2.combine_first(df1.set_index(['MANUF', 'PRODTYPE'])).reset_index()
print(output)
MANUF PRODTYPE INVENTORY PRODCODE SERIALNO
0 ALPHA 1 5 ALPHA1 1
1 ALPHA 2 3 ALPHA2 1
2 ALPHA 2 3 ALPHA2 2
3 ALPHA 2 3 ALPHA2 3
4 BETA 1 4 BETA1 1
5 BETA 1 4 BETA1 2
6 BETA 2 12 BETA2 1
7 DELTA 1 8 DELTA1 1
8 DELTA 1 8 DELTA1 2
9 DELTA 2 9 DELTA2 1
10 DELTA 2 9 DELTA2 2
使用的輸入:
df1 = pd.DataFrame({'MANUF': {0: 'ALPHA',
1: 'ALPHA',
2: 'BETA',
3: 'BETA',
4: 'DELTA',
5: 'DELTA'},
'PRODTYPE': {0: 1, 1: 2, 2: 1, 3: 2, 4: 1, 5: 2},
'PRODCODE': {0: 'ALPHA1',
1: 'ALPHA2',
2: 'BETA1',
3: 'BETA2',
4: 'DELTA1',
5: 'DELTA2'}})
df2 = pd.DataFrame({'MANUF': {0: 'ALPHA',
1: 'ALPHA',
2: 'BETA',
3: 'DELTA',
4: 'ALPHA',
5: 'BETA',
6: 'DELTA',
7: 'DELTA',
8: 'DELTA',
9: 'BETA',
10: 'ALPHA'},
'PRODTYPE': {0: 1,
1: 2,
2: 1,
3: 1,
4: 2,
5: 1,
6: 2,
7: 2,
8: 1,
9: 2,
10: 2},
'SERIALNO': {0: 1,
1: 1,
2: 1,
3: 1,
4: 2,
5: 2,
6: 1,
7: 2,
8: 2,
9: 1,
10: 3},
'INVENTORY': {0: 5,
1: 3,
2: 4,
3: 8,
4: 3,
5: 4,
6: 9,
7: 9,
8: 8,
9: 12,
10: 3}})
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.