I have a dictionary where each key is a list of tuples, like so:
[in]
product_combos = dict()
for i in training_df['product_id']:
key = tuple(i)
if key in product_combos:
product_combos[key] += 1
else:
product_combos[key] = 1
print(product_combos)
[out]
{('P06', 'P09'): 36340,
('P01', 'P05', 'P06', 'P09'): 10085,
('P01', 'P06'): 36337,
('P01', 'P09'): 49897,
('P02', 'P09'): 11573
How can I find partial matches between these keys and the column of a dataframe that is organized as such (where each row in the product_id column is a list of strings):
[in]
# Use the arrays to create a dataframe
testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id'])
# Split the product_id's for the testing data
testing_df.set_index(['transaction_id'],inplace=True)
testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))
print(testing_df.head(n=10))
[out]
product_id
transaction_id
001 [P01]
002 [P01, P02]
003 [P01, P02, P09]
004 [P01, P03]
005 [P01, P03, P05]
I want to do something like this: partial match of dictionary keys
But the comparison should be between the keys of the dictionary and the rows of the dataframe.
you can use apply.
transaction_id product_id
0 001 ['P01']
1 002 ['P01', 'P02']
2 003 ['P01', 'P02', 'P09']
3 004 ['P01', 'P03']
4 005 ['P01', 'P03', 'P05']
keys = {('P06', 'P09'): 36340,
('P01', 'P05', 'P06', 'P09'): 10085,
('P01', 'P06'): 36337,
('P01', 'P09'): 49897,
('P02', 'P09'): 11573
df.product_id.apply( lambda x:[ v for k,v in keys.iteritems() if any( i in x for i in k)][:len(x)+1])
0 [10085, 36337]
1 [10085, 11573, 36337]
2 [10085, 11573, 36337, 36340]
3 [10085, 36337, 49897]
4 [10085, 36337, 49897]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.