简体   繁体   中英

How can I make this search function to work?

I am trying to match text against a query in python, in which the matches are given the label 0, and the non-matches will be given the label 1. However, the program only appends the zeros to the list, while there are also non-matches among the texts in the file. What is going wrong?

def read_docs(filename):
    '''
    Return X,Y where X is the list of documents and Y the list of their
    labels.
    '''
    X = []
    Y = []
    q= '(nae OR Nae) OR (nea OR Nea) OR (sjaon OR Sjaon) OR (vasteloavend OR Vasteloavend) OR (zoervleisj OR Zoervleisj) OR (noe OR Noe)'
    escaped = [re.escape(query) for query in q]
    regex="|".join(escaped)
    with open(filename) as f:
        r = Reader(f, delimiter=";", dialect="excel", encoding="utf-8")
        for row in r:
            text = row[5]
            if re.search(regex, text) in row:
                Y.append(0)
            else:
                Y.append(1)
            X.append(text)
    return X,Y

There's a lot of very strange things in this code, which make no sense at all.

Firstly, you seem to be using some kind of query syntax with "OR", but then you're running it via regex. Regex doesn't know what "OR" is.

Secondly, you iterate through the elements of q, but q is a string, and its elements are characters. You should print the values of escaped and regex before your loop: they are certainly not what you are expecting.

Thirdly, your if statement is nonsense. re.search returns either a Match object or None. So it makes no sense to say if re.search(regex, text) in row , because that simply checks whether the Match object, or None, is in the row, which will always be false.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM