简体   繁体   中英

Python "in" operator not finding substring in text

I am trying to find if any substring in a list of substrings is in a given string. To do so, I loop over the items of the list and check if they exist in the string using python's in operator. I am getting False values even though I am sure one of the substrings exists in the string. I have tried all the methods meant to unify the text and the substrings: replaced all " " with "", used casefold() method, strip() , even used unidecode . Still, the substring is not found.

My code:

from unidecode import unidecode

example_string = '''available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/nanotoday
REVIEW
Synthesis, properties and applications of Janus
nanoparticles
Marco Lattuada a, T. Alan Hatton b,''' # as extracted from PDF file using fitz's `doc.load_page(0)` and then `.get_text()` 

list_of_titles = ["Synthesis, properties and applications of Janus nanoparticles", "another_title", "another_title"]

example_string = example_string.casefold()
example_string = example_string.replace(" ", "")

for title in list_of_titles:
    title = title.replace(" ", "")
    title = title.casefold()
    if unidecode(title) in unidecode(example_string):
         print("Yes")

# Outputs nothing

Try with

example_string = example_string.replace("\n", " ")
example_string = example_string.casefold()

for title in list_of_titles:
    if title.casefold() in example_string: # here casefold() again!
         print("Yes")

I think the \n make some conflicts

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM