![](/img/trans.png)
[英]Regex split the string at \n but skip the first one if it is \n\n
[英]How to split a nested list by \n but skip the the first one? Python
my_list = [
'Rob Kardashian\n 00052369 1987-03-17 Reality Star\nBrooke Barry 00213658 2001-03-30 TikTok Star',
'John Lennon\n 02578913 1940-10-09 Singer',
'Bae De Leon\n 00896351 1997-08-02 Volleyball Player\nJonas Blue 02369785 1990-08-02 Music Producer\nAlbert Einstein 65231478 1879-03-14',
'Robert Downey\n Jr 23897410 1965-04-04 Actor'
]
我上面有一个列表,我用下面的数字拆分它。
my_list_new = [re.split('\s(?=\d)|(?<=\d)\s', i) for i in my_list]
输出:
[
[ 'Rob Kardashian\n', '00052369','1987-03-17', 'Reality Star\nBrooke Barry', '00213658', '2001-03-30', 'TikTok Star'],
['John Lennon\n', '02578913', '1940-10-09', 'Singer'],
['Bae De Leon\n', '00896351', '1997-08-02', 'Volleyball Player\nJonas Blue', '02369785', '1990-08-02', 'Music Producer\nAlbert Einstein', '65231478', '1879-03-14'],
['Robert Downey\n Jr', '23897410', '1965-04-04', 'Actor']
]
下一步:我想用'\\ n'分割my_list_new但跳过第一步。
如何使用列表理解来完成工作?
预期产出:
[
['Rob Kardashian\n', '00052369', '1987-03-17', 'Reality Star', 'Brooke Barry', '00213658', '2001-03-30','TikTok Star'],
['John Lennon\n', '02578913', '1940-10-09', 'Singer'],
['Bae De Leon\n', '00896351', '1997-08-02', 'Volleyball Player', 'Jonas Blue','02369785', '1990-08-02', 'Music Producer', 'Albert Einstein', '65231478', '1879-03-14'],
['Robert Downey\n Jr', '23897410', '1965-04-04', 'Actor']
]
谢谢你的帮助!
没有itertools
:
lst = [['Rob Kardashian\n','00052369','1987-03-17','Reality Star\nBrooke Barry','00213658','2001-03-30','TikTok Star'],
['John Lennon\n', '02578913', '1940-10-09', 'Singer'],
['Bae De Leon\n','00896351','1997-08-02','Volleyball Player\nJonas Blue','02369785','1990-08-02','Music Producer\nAlbert Einstein','65231478','1879-03-14'],['Robert Downey\n Jr', '23897410', '1965-04-04', 'Actor']]
lst = [sum(row, []) for row in [[l[:1], *[i.split('\n') for i in l[1:]]] for l in lst]]
from pprint import pprint
pprint(lst, width=250)
打印:
[['Rob Kardashian\n', '00052369', '1987-03-17', 'Reality Star', 'Brooke Barry', '00213658', '2001-03-30', 'TikTok Star'],
['John Lennon\n', '02578913', '1940-10-09', 'Singer'],
['Bae De Leon\n', '00896351', '1997-08-02', 'Volleyball Player', 'Jonas Blue', '02369785', '1990-08-02', 'Music Producer', 'Albert Einstein', '65231478', '1879-03-14'],
['Robert Downey\n Jr', '23897410', '1965-04-04', 'Actor']]
您可以迭代列表元素, split
\\n
,使用itertools.chain
展平嵌套列表,并连接列表:
[l[:1] + list(itertools.chain(*[i.split('\n') for i in l[1:]])) for l in lst]
例:
In [295]: lst = [['Rob Kardashian\n','00052369','1987-03-17','Reality Star\nBrooke Barry','00213658','2001-03-30','TikTok Star'],
...: ['John Lennon\n', '02578913', '1940-10-09', 'Singer'],
...: ['Bae De Leon\n','00896351','1997-08-02','Volleyball Player\nJonas Blue','02369785','1990-08-02','Music Producer\nAlbert Einstein','65231478','1879-03-14'],['Robert Downey\n Jr', '23897410', '19
...: 65-04-04', 'Actor']]
In [296]: [l[:1] + list(itertools.chain(*[i.split('\n') for i in l[1:]])) for l in lst]
Out[296]:
[['Rob Kardashian\n',
'00052369',
'1987-03-17',
'Reality Star',
'Brooke Barry',
'00213658',
'2001-03-30',
'TikTok Star'],
['John Lennon\n', '02578913', '1940-10-09', 'Singer'],
['Bae De Leon\n',
'00896351',
'1997-08-02',
'Volleyball Player',
'Jonas Blue',
'02369785',
'1990-08-02',
'Music Producer',
'Albert Einstein',
'65231478',
'1879-03-14'],
['Robert Downey\n Jr', '23897410', '1965-04-04', 'Actor']]
编辑 - 这里只是一个列表理解,不需要导入:
[[inner_list[0]] + [split for item in inner_list[1:] for split in item.split("\n")] for inner_list in my_list]
这使用了来自@heemayl的想法,我们可以专门处理第一个元素,然后在所有其他元素上拆分,无论是否有"\\n"
。 这使得所有这些元素成为列表,因此我们将每个元素与列表推导内的另一个嵌套for循环展平。 但这比你想立刻做的更加理解......
原版的:
如果你愿意使用for循环,你可以这样做:
my_list = [
[ 'Rob Kardashian\n', '00052369','1987-03-17', 'Reality Star\nBrooke Barry', '00213658', '2001-03-30', 'TikTok Star'],
['John Lennon\n', '02578913', '1940-10-09', 'Singer'],
['Bae De Leon\n', '00896351', '1997-08-02', 'Volleyball Player\nJonas Blue', '02369785', '1990-08-02', 'Music Producer\nAlbert Einstein', '65231478', '1879-03-14'],
['Robert Downey\n Jr', '23897410', '1965-04-04', 'Actor']
]
for i, inner_list in enumerate(my_list):
new_inner_list = []
for j, item in enumerate(inner_list):
if j > 0 and "\n" in item:
new_inner_list.extend(item.split("\n"))
else:
new_inner_list.append(item)
my_list[i] = new_inner_list
不过,我不知道这是否可以用列表理解来完成; 问题是你需要解压缩你从拆分中得到的清单。 即使你能够理解它,但是,一旦你的逻辑变得非常复杂,我就不建议使用它们。
等待找到一种通用的方式,你可以用以下方法:
[re.split('\s(?=\d)|(?<=\d)\s|(?<!Rob Kardashian)\n', i) for i in my_list]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.