简体   繁体   English

Python - 创建包含多个子列表的列表

[英]Python - Create list with multiple sub-lists

What I need to do is quite simple but I can't figure out how to. 我需要做的很简单,但我无法弄清楚如何。

I have a lot of strings organized in a list: 我在列表中组织了很多字符串:

list = ['my name is Marco and i'm 24 years old', 'my name is Jhon and i'm 30 years old']

I use a regex to extract information from each element of the list: 我使用正则表达式从列表的每个元素中提取信息:

for element in list:
  name = re.findall('my name is (.*?) and i\'m', element, re.DOTALL)[0]
  age = re.findall('and i\'m (.*?) years old', element, re.DOTALL)[0]

Now what I want to do is to re-compile a new list that has as elements sub-lists composed by name and age. 现在我想要做的是重新编译一个新的列表,该列表包含按名称和年龄组成的元素子列表。

Example: 例:

for element in newlist:
  name = element[0]
  age = element[1]

Is it possible to do something like this? 可以这样做吗?

Here is the solution that will do exactly as you want. 以下是完全按照您的意愿执行的解决方案。 This will create a new list consisting of sub lists with having name and age. 这将创建一个新列表,其中包含具有名称和年龄的子列表。

new_list = []
for element in list:
   name = re.findall('my name is (.*?) and i\'m', element, re.DOTALL)[0]
   age = re.findall('and i\'m (.*?) years old', element, re.DOTALL)[0]
   new_list.append([name, age])

You can do what you want using a simple list comprehension: 您可以使用简单的列表理解来执行您想要的操作:

name_pat = re.compile('my name is (.*?) and i\'m', re.DOTALL)
age_pat = re.compile('and i\'m (.*?) years old', re.DOTALL)

new_list = [[name_pat.findall(elem)[0], age_pat.findall(elem)[0]] for elem in your_list]

First of all you don't need two regex expressions to pluck out the two values for name and age. 首先,您不需要两个正则表达式来为名称和年龄选择两个值。

>>> s = "my name is Marco and i'm 24 years old"
>>> pattern = r"my name is\s+(.+)\s+and i'm\s+(\d+)\s+years old"
>>> m = re.match(pattern, s)
>>> print(m.groups())
('Marco', '24')

And you can use a list comprehension to construct the new list: 您可以使用列表推导来构建新列表:

>>> data = ["my name is Marco and i'm 24 years old", "my name is Jhon and i'm 30 years old"]
>>> new_list = [re.match(pattern, s).groups() for s in data]
>>> print(new_list)
[('Marco', '24'), ('Jhon', '30')]

The result is a list of tuples . 结果是一个元组列表。 If you really need a list of lists you can do this: 如果您确实需要列表列表,可以执行以下操作:

new_list = [list(re.match(pattern, s).groups()) for s in data]

The list comprehension is short hand for this loop: 列表理解是这个循环的简写:

new_list = []
for s in data:
    m = re.match(pattern, s)
    if m:
        new_list.append(m.groups())

The main difference between this loop and the list comprehension is that the former can handle strings that do not match the pattern, whereas the list comprehension assumes that the pattern will always match (an exception will result if it doesn't match). 这个循环和列表理解之间的主要区别在于前者可以处理与模式不匹配的字符串,而列表理解假定模式将始终匹配(如果不匹配则会产生异常)。 You can handle this in the list comprehension, however, it starts to get ugly as you will need to perform the regex match twice: once to check whether the pattern matched, and then again to extract the actual values. 你可以在列表理解中处理这个问题,然而,它开始变得丑陋,因为你需要执行两次正则表达式匹配:一次检查模式是否匹配,然后再次提取实际值。 In this case I think that the explicit for loop is cleaner. 在这种情况下,我认为显式for循环更清晰。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM