简体   繁体   English

Python 基于部分匹配添加新数据的新行

[英]Python Add in new rows with new data based on Partial Match

Table 1表格1

|Location|Type|Supplier|     ID    |Serial|
|   MAB  |Ant |  A     |    A123   |456/56|
|   MEB  |Ant |  B     |    A123   |456/56|

Table 2表 2

|Location   |Type|Supplier|     ID      |Serial|#####|
|  MAB+MEB  |Ant |  A/B   | A123        |456/56|123-4|
|  MAB+MEB  |Ant |  A/B   | A123/B123   |456/56|432-1|
|  MAB+MEB  |Ant |  A/B   | A123/B123   |456/56|432-1|

Table 3表3

|Location|Type|Supplier|     ID    |Serial|#####|
|   MAB  |Ant |  A     | A123      |456/56|123-4|
|   MAB  |Ant |  A     | A123      |456/56|432-1|
|   MAB  |Ant |  A     | A123      |456/56|432-1|
|   MEB  |Ant |  B     | A123      |456/56|123-4|
|   MEB  |Ant |  B     | A123      |456/56|432-1|
|   MEB  |Ant |  B     | A123      |456/56|432-1|

As illustrated above , if Table 1 column 'Location' , 'Supplier' , 'ID' , 'Serial' cell content is contained in the same column cells of Table 2 , to generate Table 3.如上所示,如果表 1 列 'Location' 、 'Supplier' 、 'ID' 、 'Serial' 单元格内容包含在表 2 的相同列单元格中,则生成表 3。

*Note that Table 1 is used as the core template, if there the relevant column cells are contained in Table 2 , we are merely replicating the rows in Table 1 and adding the '####' column to each of the rows. *请注意,表 1 用作核心模板,如果表 2 中包含相关列单元格,我们只是复制表 1 中的行并将“####”列添加到每一行。

Please advice how do we produce Table 3.请建议我们如何生成表 3。

My logic: for a,b,c,d in table 1 , if a,b,c,d contained in table 2 , append 'Subcon Part #' from table 2 to table 1 by column, Concate all 'Subcon Part #' by ',' explode concated 'Subcon Part #' to generate rows with unique 'Subcon Part #'我的逻辑:对于表 1 中的 a、b、c、d,如果表 2 中包含 a、b、c、d,则按列将表 2 中的“Subcon Part #”附加到表 1,连接所有“Subcon Part #”通过 ',' 分解连接的 'Subcon Part #' 以生成具有唯一 'Subcon Part #' 的行

Where a,b,c,d are the columns of interests , the links between Table 1 and 2其中 a,b,c,d 是兴趣列,表 1 和表 2 之间的链接

Here is what I would suggest, first extracting the values from Table 2 and then merging this transformed DataFrame with table 1 on the variables of interest:这是我的建议,首先从表 2 中提取值,然后将这个转换后的 DataFrame 与表 1 中感兴趣的变量合并:

First, I reproduce your example:首先,我重现你的例子:

import pandas as pd
import re
# reproducing table 1
df1 = pd.DataFrame({"Location": ["MAB", "MEB"],
                    "Type" : ["Ant", "Ant"],
                    "Supplier":["A","B"],
                     "ID": ["A123","A123"],
                    "Serial": ["456/56","456/56"]})
# then table 2
df = pd.DataFrame({"Location": ["MAB+MEB", "MAB+MEB", "MAB+MEB"],
                   "Type": ["Ant", "Ant", "Ant"],
                   "Supplier": ["A/B", "A/B","A/B"],
                   "ID": ["A123", "A123/B123", "A123/B123"],
                   "Serial":['456/56','456/56','456/56'],
                   "values_rand":[1,2,3]})
# First I split the column I am interested in based on regexp you can tweak according
# to what you want:
r = re.compile(r"[a-zA-Z0-9]+")
df['Supplier'], df["ID"], df["Location"] = df['Supplier'].str.findall(r),\
                                           df['ID'].str.findall(r), \
                                           df['Location'].str.findall(r)
table2 = pd.merge(df['Supplier'].explode().reset_index(), 
                  df["ID"].explode().reset_index(),on="index", how="outer")
table2 = pd.merge(table2, df["Location"].explode().reset_index(), 
                  on="index", how="outer")
table2 = pd.merge(table2, df.loc[:,["Type","Serial",
                                    "values_rand"]].reset_index(), on="index",how="left")
result = (pd.merge(table2,df1, on=['Location' , 'Supplier' , 'ID' , 'Serial',"Type"])
         .drop(columns="index"))

The result is结果是

  Supplier    ID Location Type  Serial  values_rand
0        A  A123      MAB  Ant  456/56            1
1        A  A123      MAB  Ant  456/56            2
2        A  A123      MAB  Ant  456/56            3
3        B  A123      MEB  Ant  456/56            1
4        B  A123      MEB  Ant  456/56            2
5        B  A123      MEB  Ant  456/56            3

Hope it helps希望能帮助到你

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM