简体   繁体   中英

Extract specific text from another csv using python in jupyternotebook

problem i have a list of long text in one column in a csv file (proposal.csv) under header "proposal" and this contains sentences including addresses (such as building name and postal code). i have another csv file (building.csv) with building names under the "building" column.

i like to extract all the building names from the sentences in the proposal column. is there a way to do this? i spend nearly a whole day trying to figure this out but cannot seem to get. i used the df.isin(keywords) method but it appears all as false although the building names are present in the proposal column.

example of a row in the proposal column - "i live in taj mahal and it is a very pretty place". i like to extract the term "taj mahal" as it is a buidling (and taj mahal is listed inside my building csv).

can help please? thanks!!

"df" will be the data frame that contains the sentence.

"df1" will contain the building name.

building = df1['building'].tolist()
for sentence in df['proposal'].tolist():
    for sent in sentence.split():
        if sent in building:
            print("'{}' found from the sentence '{}'".format(sent, sentence))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM