简体   繁体   English

查找数据框(字符串列表)和字典键(元组)之间的部分匹配

[英]Find partial matches between a dataframe (list of strings) and dictionary keys (tuples)

I have a dictionary where each key is a list of tuples, like so: 我有一本字典,其中每个键都是一个元组列表,像这样:

[in]
product_combos = dict()
for i in training_df['product_id']:
    key = tuple(i)
    if key in product_combos:
        product_combos[key] += 1
    else:
        product_combos[key] = 1

print(product_combos)

[out]
{('P06', 'P09'): 36340, 
('P01', 'P05', 'P06', 'P09'): 10085, 
('P01', 'P06'): 36337, 
('P01', 'P09'): 49897, 
('P02', 'P09'): 11573

How can I find partial matches between these keys and the column of a dataframe that is organized as such (where each row in the product_id column is a list of strings): 如何找到这些键和以这种方式组织的数据框的列之间的部分匹配(其中product_id列中的每一行都是字符串列表):

[in]
# Use the arrays to create a dataframe
testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id'])

# Split the product_id's for the testing data
testing_df.set_index(['transaction_id'],inplace=True)

testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))
print(testing_df.head(n=10))

[out]
                     product_id
transaction_id                 
001                       [P01]
002                  [P01, P02]
003             [P01, P02, P09]
004                  [P01, P03]
005             [P01, P03, P05]

I want to do something like this: partial match of dictionary keys 我想做这样的事情: 字典键的部分匹配

But the comparison should be between the keys of the dictionary and the rows of the dataframe. 但是比较应该在字典的键和数据框的行之间进行。

you can use apply. 您可以使用Apply。

  transaction_id product_id            
0  001                          ['P01']
1  002                   ['P01', 'P02']
2  003            ['P01', 'P02', 'P09']
3  004                   ['P01', 'P03']
4  005            ['P01', 'P03', 'P05']

keys = {('P06', 'P09'): 36340, 
 ('P01', 'P05', 'P06', 'P09'): 10085, 
('P01', 'P06'): 36337, 
('P01', 'P09'): 49897, 
('P02', 'P09'): 11573

df.product_id.apply( lambda x:[ v for k,v in keys.iteritems() if any( i in x for i in k)][:len(x)+1])

0                  [10085, 36337]
1           [10085, 11573, 36337]
2    [10085, 11573, 36337, 36340]
3           [10085, 36337, 49897]
4           [10085, 36337, 49897]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM