Python - 从子字符串列表中搜索列表中的子字符串

Question

Trying to use a list of keywords to search another list of strings by the keywords.尝试使用关键字列表按关键字搜索另一个字符串列表。 Some of them are formatted a bit weird.其中一些格式有点奇怪。

results_list = ['user 1 \n    date of birth', '11 Jan 1990','user 1 age', '29','user 1 income', '60 000',
'user 2 \n    username', 'guest_user2','user 2 age', '25','user 2 income', '45 000']
keywords = ['date of birth','age','income','username']

I tried this code:我试过这个代码：

final_dict = {}
for r in range(len(results_list)):
   for word in range(len(keywords)):
       if keywords[words] in results_list[r]:
           print(keywords[word])
           print(results_list[r])
           r_key_idx = results_list.index(results_list[r])
           r_val_idx = r_key_idx + 1
           dictionary = {results_list[r_key_idx]:results_list[r_val_idx]}
           final_dict.update(dictionary)

This results in an output dictionary of这导致输出字典为

{'user 1 age':'29', 'user1 income':'60 000', 'user 2 age':'25', 'user2 income':'45 000'}

*note, in this example it finds the substrings. *注意，在本例中，它查找子字符串。 but in my working dataset, it does not.但在我的工作数据集中，它没有。 tested it in repl.it and it worked.在 repl.it 中对其进行了测试，并且有效。

It doesn't seem to grab the ones that have the \\n in it.它似乎没有抓住其中包含\\n的那些。 I don't want to just make bunch of different keywords because it changes quite often, based on the values in the table and it's quite a large table and making hundreds of different keywords with the \\n just seems self-defeating.我不想只制作一堆不同的关键字，因为它经常变化，基于表中的值，而且它是一个相当大的表，使用\\n制作数百个不同的关键字似乎是弄巧成拙。

Also, note the examples are not the same as my actual dataset (the actual dataset has about 12 spaces after the \\n , not sure if that would change anything though).另外，请注意示例与我的实际数据集不同（实际数据集在\\n之后有大约 12 个空格，但不确定这是否会改变任何内容）。

Answer 1

Try sanitizing your data list first and then run your code.首先尝试清理您的数据列表，然后运行您的代码。 Sanitize your data like this below.像下面这样清理您的数据。 Your keywords should match after this.您的关键字应该在此之后匹配。

results_list = ['user 1 \n    date of birth', '11 Jan 1990','user 1 age', '29','user 1 income', '60 000',
'user 2 \n    username', 'guest_user2','user 2 age', '25','user 2 income', '45 000']

for index, res in enumerate(results_list):
    if '\n' in res:
        new_res = res.split('\n')
        #remove empty space to the left
        new_res[1] = new_res[1].lstrip(" ")
        results_list[index] = "".join(new_res)

print(results_list)#place your code after this line


#['user 1 date of birth', '11 Jan 1990', 'user 1 age', '29', 'user 1 income', '60 000', 'user 2 username', 'guest_user2', 'user 2 age', '25', 'user 2 income', '45 000']

Answer 2

You need to clean your string before comparison.您需要在比较之前清理您的字符串。

One more thing if you results_list always have a key and its value on next index then you can use range method with jump parameter(3rd parameter)还有一件事，如果你的 results_list 总是在下一个索引上有一个键和它的值，那么你可以使用带有跳转参数（第三个参数）的范围方法

final_dict = {}
for i in range(0, len(results_list), 2):
    # This will change multiple spaces into 1 including \n
    key = " ".join(results_list[i].split())
    print(key)
    if [keyword for keyword in keywords if keyword in key]:
        final_dict[key] = results_list[i+1]

Python - 从子字符串列表中搜索列表中的子字符串

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-12-13 11:48:12

解决方案2
0 2019-12-13 12:00:06

Python - 从子字符串列表中搜索列表中的子字符串

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-12-13 11:48:12

解决方案2 0 2019-12-13 12:00:06

解决方案1
1 已采纳 2019-12-13 11:48:12

解决方案2
0 2019-12-13 12:00:06