简体   繁体   中英

return matching word in python

I wrote a script that is checking if values in 'Product content' sheet (column 'TITLE') match with values from 'Keyword list' sheet, column 'KEYWORD' (the same workbook). Compare_title function returns true or false which is ok but I also need to know which keywords are matching so not only true/false output but also the word that is considered as 'True match'.

The Python script is below.

import pandas as pd
import re


file_path ='C:/Users/User/Desktop/data.xlsx'


def get_keyword(file_path):
    """
    Get keywords that are active (based on value in column 'ACTIVE?') from 'KEYWORD' column
    from 'Hidden search' terms sheet and convert it into the list
    """
    df = pd.read_excel(file_path, sheet_name='Keyword list')
    keywords = df['KEYWORD'].to_list()

    return keywords


keyword_list = get_keyword(file_path)


def words(phrase: str) -> [str]:
    """
    Splits string to words by all characters that are not letters or digits (spaces, commas etc.)
    """

    return list(map(lambda x: x.lower(), filter(len, re.split(r'\W', phrase))))


def compare_title(file_path):
    """
    Get title from 'Product content' sheet and compare the values with keyword_list values
    """

    df = pd.read_excel(file_path, sheet_name='Product content')
    df = df.fillna('-')
    title = df['TITLE'].apply(lambda find_kw: any([keyword in words(find_kw) for keyword in keyword_list]))

    return title

Thanks in advance for your help.

I think this is what you're looking for:

title = df['TITLE'].apply(lambda find_kw: [keyword for keyword in keyword_list if keyword in words(find_kw)]))

This means compare_title will return list[str] instead of bool . If you do if compare_title(...) it still works as before because an empty list is falsy and a non-empty list is truthy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM