[英]how to conditionally (value in column) search for substrings in another column in python loop
I need to do a substring search in a string by condition in the second column.我需要在第二列中按条件在字符串中进行 substring 搜索。 I have 2 dataframes:我有 2 个数据框:
df1 = {'Descr': ["VALVE, PRESSURE", "pump ttf", "Valve, electrical", "Geeku, electrical","VALVE, OVERBOARD, BUTTERFLY"],
'N_Product': ["VALVE", "PUMP", "VALVE", "GEEKU","VALVE"],
}
df2 = {'N_Product': ["VALVE", "VALVE","VALVE", "PUMP", "GEEKU"],
'M_Product': ["PRESSURE", "qwerty","", "", "ELECTRICAL"],
}
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
(Step 1) For the first row in df1, the N_Product column is VALVE. (第 1 步)对于 df1 中的第一行,N_Product 列是 VALVE。
(Step 2) We look for VALVE in the N_Product column of every row of df2 and find 3 matches with the following (N_Product, M_Product) pairs: row 0 has VALVE,PRESSURE; (第 2 步)我们在 df2 的每一行的 N_Product 列中查找 VALVE,并找到 3 个与以下 (N_Product, M_Product) 对匹配的项:第 0 行有 VALVE、PRESSURE; row 1 has VALVE,qwerty;第 1 行有 VALVE,qwerty; row2 has VALVE,"". row2 有 VALVE,"".
(Step 3) Then you need to check whether any of these pairs (df2 (M_Product)) is contained in Df1 ['Descr'], if it is contained then you need to write N_Product + ":" + M_Product + ";". (第3步)然后你需要检查这些对(df2(M_Product))是否包含在Df1 ['Descr']中,如果包含那么你需要写N_Product +“:”+ M_Product +“;” . For Valve you only need to search for "Pressure", "Electrical" and "", others are not required, for N_Product ('GEEKU') - only 'Electrical' ', etc., depending on which pairs are in the df2 file对于 Valve,您只需搜索“Pressure”、“Electrical”和“”,其他不需要,对于 N_Product ('GEEKU') - 只需搜索'Electrical''等,具体取决于 df2 文件中的对
c = df2['M_Product'].astype(str).to_list()
def matcher(x):
for i in c:
if i.lower() in x.lower():
return i
else:
return np.nan
df1['Res'] = df1['Descr'].apply(matcher)
but I don't know how to cycle through the values of only the corresponding M_Product for N_Product.但我不知道如何只循环显示 N_Product 对应的 M_Product 的值。
Desired result:期望的结果:
df1 = {'Descr': ["VALVE, PRESSURE", "pump ttf", "Valve, electrical", "Geeku, electrical","VALVE, OVERBOARD, BUTTERFLY"],
'N_Product': ["VALVE", "PUMP", "VALVE", "GEEKU","VALVE"],
},
'Result': ["VALVE: PRESSURE;", "PUMP", "VALVE;", "GEEKU: ELECTRICAL;","VALVE;"],
}
I would be grateful for any help.如果有任何帮助,我将不胜感激。 If you have any options, please help如果您有任何选择,请帮助
( UPDATED ) (更新)
Based on the updated question, my understanding of what's being asked is this:根据更新的问题,我对所问内容的理解是:
Result
column创建一个新的Result
列N_Product
columns in df1 and df2 for a given row match, then append to the value from the N_Product
column in df1 the first match in column M_Product
of df2 of a string found in the given row's Descr
column in df1 (with an intervening :
character).如果给定行的 df1 和 df2 中的N_Product
列匹配,则 append 到 df1 中N_Product
列的值是 df1 中给定行的Descr
列中找到的字符串的 df2 列M_Product
中的第一个匹配项(中间有:
特点)。N_Product
from df1 in the Result
column.否则,将来自 df1 的N_Product
放入Result
列。;
还有 append 一个;
character to what is put in Result
.放入Result
的字符。Here is a way to do that:这是一种方法:
def foo(x):
descr = x['Descr'].upper()
match = None
for mStr in df2['M_Product'].str.upper():
if mStr in descr:
match = mStr
break
if match is None:
return x['N_Product'] + ';'
else:
return x['N_Product'] + ': ' + match + ';'
mask = df1['N_Product'] == df2['N_Product']
df1.loc[mask, 'Result'] = df1.apply(foo, axis = 1)
df1.loc[~mask, 'Result'] = df1['N_Product'] + ';'
Explanation:解释:
mask
that is True for rows of df1 with N_Product
matching the corresponding value in df2.创建一个 boolean 系列mask
,该掩码对于 df1 的行为真, N_Product
与 df2 中的相应值匹配。mask
is True, use apply
to call foo
which performs the logic of identifying the first value (if any) in the M_Product
column of df2 that is found in a given row's Descr
column and packaging it in a string of the form N_Product: M_Product;
对于 df1 中mask
为 True 的行,使用apply
调用foo
执行识别 df2 的 M_Product 列中的第一个值(如果有的话)的逻辑,该列Descr
给定行的M_Product
列中找到并将其打包在字符串中表格N_Product: M_Product;
if found, otherwise just N_Product;
如果找到,否则只是N_Product;
. .mask
is False (namely: ~mask
), set the Result
column to be N_Product;
对于 df1 中mask
为 False 的行(即: ~mask
),将Result
列设置为N_Product;
. .Input:输入:
df1:
Descr N_Product
0 VALVE, PRESSURE VALVE
1 pump ttf PUMP
2 Valve, electrical VALVE
3 Geeku, electrical GEEKU
4 VALVE, OVERBOARD, BUTTERFLY VALVE
df2:
N_Product M_Product
0 VALVE PRESSURE
1 VALVE ELECTRICAL
2 VALVE
3 PUMP
4 GEEKU MBA
Output: Output:
Descr N_Product Result
0 VALVE, PRESSURE VALVE VALVE: PRESSURE;
1 pump ttf PUMP PUMP;
2 Valve, electrical VALVE VALVE: ELECTRICAL;
3 Geeku, electrical GEEKU GEEKU;
4 VALVE, OVERBOARD, BUTTERFLY VALVE VALVE;
UPDATE #2:更新#2:
Here's a solution based on a relaxation of the matching criteria for N_Product
:这是一个基于放宽N_Product
匹配标准的解决方案:
Result
column创建一个新的Result
列N_Product
value is found in the N_Product
column of df2, then append to this value the first match in column M_Product
of df2 of a string found in the given row's Descr
column in df1 (with an intervening :
character).对于 df1 中的每一行,如果在 df2 的N_Product
列中找到N_Product
值,则 append 到该值是在 df1 中给定行的Descr
列中找到的字符串的 df2 列M_Product
中的第一个匹配项(中间有一个:
字符).N_Product
from df1 in the Result
column.否则,将来自 df1 的N_Product
放入Result
列。;
还有 append 一个;
character to what is put in Result
.放入Result
的字符。def foo(x):
descr = x['Descr'].upper()
match = None
if x['N_Product'].upper() in list(df2['N_Product']):
for mStr in df2['M_Product'].str.upper():
if mStr in descr:
match = mStr
break
if match is None:
return x['N_Product'] + ';'
else:
return x['N_Product'] + ': ' + match + ';'
df1['Result'] = df1.apply(foo, axis = 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.