简体   繁体   English

在列表列表中操作字符串

[英]Manipulating a string, in a list of lists

I am writing a function that takes in a list as parameter.我正在编写一个将列表作为参数的函数。 This parameter is a list of lists of strings, each string contains the first and the last name separated by a white space.此参数是一个字符串列表列表,每个字符串包含由空格分隔的名字和姓氏。 I am supposed to check in each list if the first name is repeated, and if so, to create a new list containing the repeated names.如果名字重复,我应该检查每个列表,如果是,则创建一个包含重复名称的新列表。 The word counts as repeated only if it was repeated in its sublist.仅当该词在其子列表中重复时才算作重复。 Eg例如

 >>>findAgents( [["John Knight", "John Doe", "Erik Peterson"],["Fred Douglas", "John Stephans", "Mike Dud", "Mike Samuels"]])

would yield会屈服

 ['John', 'Mike']

So far I have been able to iterate through the list and access first names.到目前为止,我已经能够遍历列表并访问名字。 But I don't know how to organize them in a way that will keep them in their own areas, so I can check if something is repeated JUST in that area.但是我不知道如何以一种将它们保留在自己的区域中的方式组织它们,因此我可以检查是否仅在该区域重复了某些内容。 This is my code:这是我的代码:

def findAgents(listOlists):
newlist = []
x = 0
for alist in listOlists:
    for name in alist:
        space = name.find(" ")
        firstname = (name[0:space])
        print( firstname)

I'd rewrite that using collections.Counter in a flattened list comprehension, counting the first names (using str.partition ) and filtering on first names when more than 1 occurrence:我会在扁平列表理解中使用collections.Counter重写它,计算名字(使用str.partition )并在出现超过 1 次时过滤名字:

l = [["John Knight", "John Doe", "Erik Peterson"],["Fred Douglas", "John Stephans", "Mike Dud", "Mike Samuels"]]

import collections

x = [k for sl in l for k,v in collections.Counter(x.partition(" ")[0] for x in sl).items() if v>1]
print(x)

result:结果:

['John', 'Mike']

You can try this :你可以试试这个:

def func(temp) :
dic = {}
for i in temp :
    for j in i :
        dic[j.split(" ")[0]] = dic.get(j.split(" ")[0], 0) + 1
return dic

Now, we need to get all names whose count is greater than or equal to 2. This can be done by a single iteration over the dictionary :现在,我们需要获取计数大于或等于 2 的所有名称。这可以通过对字典进行一次迭代来完成:

temp = []
for i in dic :
    if dic[i] >= 2 :
        temp.append(dic[i])

The list temp will contain the desired result.列表temp将包含所需的结果。

I'd use regex and pluck out the duplicate name from each list:我会使用正则表达式并从每个列表中取出重复的名称:

import re

names = [["John Knight", "John Doe", "Erik Peterson"],["Fred Douglas", "John Stephans", "Mike Dud", "Mike Samuels"]]

def extractDups(names):
       res = []
       for eachlist in names:
          res.extend(re.findall(r'\b(\w+)\b.*\1', ' '.join(eachlist)))
       return(res)

example:例子:

    >>>extractDups(names)
    ['John', 'Mike'] 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM