简体   繁体   中英

Python Comparison of Two lists

I am working on NLP project. I have extracted keywords from Resume and stored them in the list. The other list consists of all technical keywords which I have extracted from JSON. Both the lists consist of many keywords and below is just for reference.

list_of_keys=['azure', 'job', 'matlab', 'javascript', 'http', 'android', 'amazon', 'apache spark']

result=['apache http server', 'angularjs', 'azure bot service', 'amazon s3', 'android sdk', 'android studio', 'amazon cloudfront']

Code:

with open('rawtext.json','r', encoding='utf-8') as f:
    data = json.load(f)
result = [x["name"].replace("@", " ").lower() for x in data]
print(result)

print ("List of Matched Keywords are:\n")
# Comparing Lists

for item in list_of_keys: 
    for item1 in result: 
        if item == item1: 
            print("Word from Resume: ", item, ", Word from JSON data: ", item1)
print ("****************\n")

Current Output

Word from Resume: box, Word from JSON data: box Word from Resume: arduino, Word from JSON data: arduino Word from Resume: arduino, Word from JSON data: arduino Word from Resume: browser, Word from JSON data: browser Word from Resume: black, Word from JSON data: black Word from Resume: address, Word from JSON data: address Word from Resume: address, Word from JSON data: address

I have tried above a very simple technique by comparing two lists that just matches exact words and prints them. However, what I want is if there is any match in two lists eg if 'apache spark' gets matched with result list 'apache http server' then it should print as an output: Word from Resume: apache spark, Word from JSON data: apache http server. Similarly, if amazon is matched then it should print as an output: Word from Resume: amazon, Word from JSON data: amazon s3, amazon cloudfront

Required Output:

Word from Resume: apache spark, Word from JSON data: apache http server Word from Resume: amazon, Word from JSON data: amazon s3, amazon cloudfront Word from Resume: http, Word from JSON data: apache http server

Can someone please help me out. Thank you.

Maybe try this:

common = list(set(list_of_keys) & set(result))

For instance:

list_of_keys = ['one','two','three','some more']
result = ['two','some more']

common = list(set(list_of_keys) & set(result))

print (common)

Output:

['two', 'some more']

I think what you're trying to achieve is a bit different to a simple equality check, ie 'azure' == 'azure bot service' will always return False .

The comparison check can be more sophisticated, but from your expected behaviour, I believe you're looking for this:

from collections import defaultdict

res_dict = defaultdict(list)
for item in list_of_keys: 
    for item1 in result: 
        if item in item1:
            res_dict[item].append(item1)

for k,v in res_dict.items():
    print("Word from Resume: ", k, ", Word from JSON data: ", ",".join(v))
print ("****************\n")

I've replaced the = check with the in check, which means that the comparison will return true if azure occurs inside azure bot service but will return false for all the other strings from the results array.

I would also suggest looking at Does Python have a string 'contains' substring method? for more complex substring matches since you're probably looking to check if words co-occur between your list_of_keys and results array.

Alternatively, you can also look at fuzzy search since it seems very close to your intended behaviour https://pypi.org/project/fuzzysearch/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM