简体   繁体   English

从列表列表中删除特定短语

[英]Remove a specific phrase from a list of lists

I have stored data in a list of lists (I couldn't use a dict because I need to have duplicate keys). 我已将数据存储在列表列表中(我不能使用字典,因为我需要重复的键)。 The list is like: 列表如下:

data = [[1, "name email@email.com address"], [2, "name2 email@@email2.com address"], ...]

My goal is to remove the email addresses from the data list (ie the list of lists). 我的目标是从数据列表(即列表列表)中删除电子邮件地址。 Unfortunately, the email addresses are all different. 不幸的是,电子邮件地址都是不同的。 They only share one common trait: they all contain the symbol "@". 它们仅具有一个共同特征:它们都包含符号“ @”。

I tried to use list comprehensions. 我尝试使用列表推导。 However, I can only do it so that the entire element gets removed, ie "name email@email.com address" gets removed entirely: 但是,我只能这样做,以便整个元素都被删除,即“ name email@email.com address”被完全删除:

newlist = [element for element in data.split() if "@" not in elment]

I thought of splitting "name email@email.com address" into sublists using " " as the delimiter. 我想到了使用“”作为分隔符将“名称email@email.com地址”分成多个子列表。 However, that presents a problem as well: It ruins the format. 但是,这也带来了一个问题:它破坏了格式。 It would be difficult for me to group the lists together to the initial format, because sometimes "name email@email.com address" contains more than three words. 对于我来说,将列表组合成初始格式是很困难的,因为有时“名称email@email.com地址”包含三个以上的单词。 For example, it could be ""name1 name2 name3 email@email.com email2 email3 address1 address2 address3". 例如,它可以是““ name1 name2 name3 email@email.com email2 email3 address1 address2 address3”。

What is the best way of doing this? 最好的方法是什么?

EDIT: 编辑:

To answer Adam Smith's question, I'm looking for 要回答亚当·斯密的问题,我在寻找

data = [[1, "name address"], [2, "name2 address"], ...]

as my output. 作为我的输出。 In other words, the original format (list of lists, where the sublists contain two elements, one being the number and the other one being "name, address, address1, etc") is preserved without the email addresses. 换句话说,保留了原始格式(列表列表,其中子列表包含两个元素,一个是数字,另一个是“名称,地址,地址1等”),而没有电子邮件地址。

data = [[1, "name email@email.com address"], [2, "name2 email@@email2.com address"],[3, "name1 name2 name3 email@email.com email2 email3 address1 address2 address3"]]

for ind,d in enumerate(data):
         data[ind]=[d[0]," ".join([x for x in d[1].split() if "@" not in x])] # add the int first then change elements from  index 1. 
print data

[[1, 'name address'], [2, 'name2 address'], [3, 'name1 name2 name3 email2 email3 address1 address2 address3']]

I think you should split on the '@' character and then iterate through the list of strings generated by the split pairing the first element from its end using rfind to look for a space character and second element from the start up until the first space. 我认为您应该分割'@'字符,然后遍历由分割生成的字符串列表,并使用rfind从头开始将第一个元素配对,从头开始寻找一个空格字符,直到第二个元素为止。 Then, remove those substrings. 然后,删除那些子字符串。 If it's the case that there will possibly be more than one email address, you would need to do the same for all remaining elements (pairing the second and third elements, pairing the third and fourth elements, etc.) to see if there are any other substrings to remove. 如果情况可能是一个以上的电子邮件地址,则需要对所有剩余元素(将第二个和第三个元素配对,将第三个和第四个元素配对等)执行相同的操作,以查看是否存在任何一个电子邮件地址。其他要删除的子字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM