简体   繁体   English

Python - 从子字符串列表中搜索列表中的子字符串

[英]Python - Search for substring in list from a list of substrings

Trying to use a list of keywords to search another list of strings by the keywords.尝试使用关键字列表按关键字搜索另一个字符串列表。 Some of them are formatted a bit weird.其中一些格式有点奇怪。

results_list = ['user 1 \n    date of birth', '11 Jan 1990','user 1 age', '29','user 1 income', '60 000',
'user 2 \n    username', 'guest_user2','user 2 age', '25','user 2 income', '45 000']
keywords = ['date of birth','age','income','username']

I tried this code:我试过这个代码:

final_dict = {}
for r in range(len(results_list)):
   for word in range(len(keywords)):
       if keywords[words] in results_list[r]:
           print(keywords[word])
           print(results_list[r])
           r_key_idx = results_list.index(results_list[r])
           r_val_idx = r_key_idx + 1
           dictionary = {results_list[r_key_idx]:results_list[r_val_idx]}
           final_dict.update(dictionary)

This results in an output dictionary of这导致输出字典为

{'user 1 age':'29', 'user1 income':'60 000', 'user 2 age':'25', 'user2 income':'45 000'}

*note, in this example it finds the substrings. *注意,在本例中,它查找子字符串。 but in my working dataset, it does not.但在我的工作数据集中,它没有。 tested it in repl.it and it worked.在 repl.it 中对其进行了测试,并且有效。

It doesn't seem to grab the ones that have the \\n in it.它似乎没有抓住其中包含\\n的那些。 I don't want to just make bunch of different keywords because it changes quite often, based on the values in the table and it's quite a large table and making hundreds of different keywords with the \\n just seems self-defeating.我不想只制作一堆不同的关键字,因为它经常变化,基于表中的值,而且它是一个相当大的表,使用\\n制作数百个不同的关键字似乎是弄巧成拙。

Also, note the examples are not the same as my actual dataset (the actual dataset has about 12 spaces after the \\n , not sure if that would change anything though).另外,请注意示例与我的实际数据集不同(实际数据集在\\n之后有大约 12 个空格,但不确定这是否会改变任何内容)。

Try sanitizing your data list first and then run your code.首先尝试清理您的数据列表,然后运行您的代码。 Sanitize your data like this below.像下面这样清理您的数据。 Your keywords should match after this.您的关键字应该在此之后匹配。

results_list = ['user 1 \n    date of birth', '11 Jan 1990','user 1 age', '29','user 1 income', '60 000',
'user 2 \n    username', 'guest_user2','user 2 age', '25','user 2 income', '45 000']

for index, res in enumerate(results_list):
    if '\n' in res:
        new_res = res.split('\n')
        #remove empty space to the left
        new_res[1] = new_res[1].lstrip(" ")
        results_list[index] = "".join(new_res)

print(results_list)#place your code after this line


#['user 1 date of birth', '11 Jan 1990', 'user 1 age', '29', 'user 1 income', '60 000', 'user 2 username', 'guest_user2', 'user 2 age', '25', 'user 2 income', '45 000'] 

You need to clean your string before comparison.您需要在比较之前清理您的字符串。

One more thing if you results_list always have a key and its value on next index then you can use range method with jump parameter(3rd parameter)还有一件事,如果你的 results_list 总是在下一个索引上有一个键和它的值,那么你可以使用带有跳转参数(第三个参数)的范围方法

final_dict = {}
for i in range(0, len(results_list), 2):
    # This will change multiple spaces into 1 including \n
    key = " ".join(results_list[i].split())
    print(key)
    if [keyword for keyword in keywords if keyword in key]:
        final_dict[key] = results_list[i+1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM