简体   繁体   English

在不修改原始输入的情况下对复杂字符串列表进行详尽搜索

[英]exhaustive search over a list of complex strings without modifying original input

I am attempting to create a minimal algorithm to exhaustively search for duplicates over a list of strings and remove duplicates using an index to avoid changing cases of words and their meanings.我正在尝试创建一个最小的算法来详尽地搜索字符串列表中的重复项并使用索引删除重复项以避免更改单词的大小写及其含义。

The caveat is the list has such words Blood, blood, DNA, ACTN4, 34-methyl-O-carboxy, Brain, brain-facing-mouse, BLOOD and so on.需要注意的是,该列表包含 Blood、blood、DNA、ACTN4、34-methyl-O-carboxy、Brain、brain-facing-mouse、BLOOD 等词。

I only want to remove the duplicate 'blood' word, keep the first occurrence with the first letter capitalized, and not modify cases of any other words.我只想删除重复的“血”字,保留第一个出现的首字母大写,而不修改任何其他词的大小写。 Any suggestions on how should I proceed?关于我应该如何进行的任何建议?

Here is my code这是我的代码

def remove_duplicates(list_of_strings):
""" function that takes input of a list of strings, 
uses index to iterate over each string lowers each string 
and returns a list of strings with no duplicates, does not modify the original strings
an exhaustive search to remove duplicates using index of list and list of string"""

list_of_strings_copy = list_of_strings
try:
    for i in range(len(list_of_strings)):
        list_of_strings_copy[i] = list_of_strings_copy[i].lower()
        word = list_of_strings_copy[i]
        for j in range(len(list_of_strings_copy)):
            if word == list_of_strings_copy[j]:
                list_of_strings.pop(i)
                j+=1
except Exception as e:
    print(e)
return list_of_strings

Make a dictionary, {text.lower():text,...} , use the keys for comparison and save the first instance of the text in the values.制作字典{text.lower():text,...} ,使用键进行比较并将文本的第一个实例保存值中。

d={}
for item in list_of_strings:
    if item.lower() not in d:
        d[item.lower()] = item

d.values() should be what you want. d.values() 应该是你想要的。

I think something like the following would do what you need:我认为像下面这样的东西可以满足你的需要:

def remove_duplicates(list_of_strings):
    new_list = [] # create empty return list
    for string in list_of_strings: # iterate through list of strings
        string = string[0].capitalize() + string[1:].lower() # ensure first letter is capitalized and rest are low case
        if string not in new_list: # check string is not duplicate in retuned list
            new_list.append(string) # if string not in list append to returned list
    return new_list # return end list
    
strings = ["Blood", "blood", "DNA", "ACTN4", "34-methyl-O-carboxy", "Brain", "brain-facing-mouse", "BLOOD"]
returned_strings = remove_duplicates(strings)
print(returned_strings)

(For reference this was written in Python 3.10) (供参考,这是写在 Python 3.10 中的)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM