a
through array b
without considering NaN
valueSingle Array:
a=['Black', 'Pen','NaN'] #product
b=['Black', 'Book', 'Big'] #catalog
c=[]
for i in a:
if i != "NaN":
c.append(i)
matched_count=0
for i in c:
if i in b:
matched_count +=1
matched_count
score = float(matched_count) / len(c)
print(score)
Output:
0.5
I would like to replicate the same process for multiple products like the following. Kindly let me know how to tackle this.
Input - Multiple Array:
products_bag =([['mai', 'dubai', '200ml', 'NaN'],
['mai', 'dubai', 'cup'],
['mai', 'dubai', '1.5l']]) #multiple products
catalogs_bag =([['natural','mineral','water', 'cups', '200', 'ml', 'pack', 'of', '24', 'mai', 'dubai'],
['2-piece', 'glitzi', 'power', 'inox', 'power', 'dish'],
['15-piece', 'bones', 'for', 'dog', 'multicolour', 'rich']]) #bigger catalog
Expected Output:
['mai', 'dubai', '200ml', 'NaN'] -> ['natural','mineral','water', 'cups', '200', 'ml', 'pack', 'of', '24', 'mai', 'dubai'] -> 67%
['mai', 'dubai', 'cup'] -> ['natural','mineral','water', 'cups', '200', 'ml', 'pack', 'of', '24', 'mai', 'dubai'] -> 67%
['mai', 'dubai', '1.5l'] -> ['natural','mineral','water', 'cups', '200', 'ml', 'pack', 'of', '24', 'mai', 'dubai'] -> 67%
You can do it like this without numpy or panda with a 2nd for loop nothing fancy:
products_bag =([['Black', 'Pen','NaN'],
['Yellow', 'Pen','Small']]) #multiple products
catalogs_bag =([['Black', 'Pen', 'Big'],
['Black', 'Pen', 'Small']]) #bigger catalog
def find_distribution(products, catalog):
item_counter = 0
matched_count = 0
for product in products:
if not "nan" in product.lower():
item_counter += 1
if product in catalog:
matched_count += 1
if item_counter == 0: # in case products is empty or only have NaN values.
return 0
return matched_count / item_counter
for i in range(len(products_bag)):
print("{} \t-> {} \t-> \t {}%".format(products_bag[i], catalogs_bag[i], round(100*find_distribution(products_bag[i],catalogs_bag[i]),2)))
Output:
['Black', 'Pen', 'NaN'] -> ['Black', 'Pen', 'Big'] -> 100.0%
['Yellow', 'Pen', 'Small'] -> ['Black', 'Pen', 'Small'] -> 66.67%
@Edit
In case you want to use dataframe:
import pandas as pd
# initialize of the two list (read from csv) and the function find_distribution
df = pd.DataFrame(list(zip(products_bag,catalogs_bag)), columns=["products","catalogs"])
df["distribution"] = df.apply(lambda row: 100*round(find_distribution(row["products"],row["catalogs"]),2), axis=1)
for index, row in df.iterrows():
print("{} \t-> {} \t-> \t {}%".format(row["products"], row["catalogs"], row["distribution"]))
Important : use sensitive names for your variables.
Version for one product, that returns the name of the item, and the score associated to
NaN
value : shorter["pen"] -> [["pen"]]
)tuple(score, catalog_item)
to get both score and name of the corresponding item at the enddef available(product, catalog):
items = [_ for _ in product if _ != "NaN"]
if isinstance(catalog[0], str):
catalog = [catalog]
max_match = (0, [])
for catalog in catalog:
matched_count = 0
for item in items:
if item in catalog:
matched_count += 1
max_match = max(max_match, (matched_count, catalog)) # tuple score + catalog_item
return "_".join(items), max_match[1], max_match[0] / len(items)
# USE
a = ['Black', 'Pen', 'NaN']
b = ['Black', 'Book', 'Big']
print(available(a, b)) # (['Black', 'Pen'], ['Black', 'Book', 'Big'], 0.5)
# Shorter version, using built-in function and list comprehension
def available(product, catalog):
items = [_ for _ in product if _ != "NaN"]
if isinstance(catalog[0], str):
catalog = [catalog]
max_match = max([(sum([1 for item in items if item in catalog]), catalog) for catalog in catalog])
return "_".join(items), max_match[1], max_match[0] / len(items)
Multi-product version : apply the one-product version to each
def availables(products, catalog):
return [available(product, catalog) for product in products]
# USE
a = [['Black', 'Pen', 'NaN'], ['Yellow', 'Pen', 'Small']]
b = [['Black', 'Pen', 'Big'], ['Black', 'Pen', 'Small']]
print(availables(a, b))
# (['mai', 'dubai', '200ml'], ['natural', 'mineral', 'water', 'cups', '200', 'ml', 'pack', 'of', '24', 'mai', 'dubai'], 0.6666666666666666)
# (['mai', 'dubai', 'cup'], ['natural', 'mineral', 'water', 'cups', '200', 'ml', 'pack', 'of', '24', 'mai', 'dubai'], 0.6666666666666666)
# (['mai', 'dubai', '1.5l'], ['natural', 'mineral', 'water', 'cups', '200', 'ml', 'pack', 'of', '24', 'mai', 'dubai'], 0.6666666666666666)
To get your formating with arrow just
for res in availables(products_bag, catalogs_bag):
print(" -> ".join(map(str, res)))
['mai', 'dubai', '200ml'] -> ['natural', 'mineral', 'water', 'cups', '200', 'ml', 'pack', 'of', '24', 'mai', 'dubai'] -> 0.6666666666666666
['mai', 'dubai', 'cup'] -> ['natural', 'mineral', 'water', 'cups', '200', 'ml', 'pack', 'of', '24', 'mai', 'dubai'] -> 0.6666666666666666
['mai', 'dubai', '1.5l'] -> ['natural', 'mineral', 'water', 'cups', '200', 'ml', 'pack', 'of', '24', 'mai', 'dubai'] -> 0.6666666666666666
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.