简体   繁体   English

python使用列表中的文本将字符串列表拆分为字符串列表

[英]python split string list into lists of strings using text in the list

I have a list named 'exemptions' with several fields (string variables). 我有一个名为“豁免”的列表,其中包含多个字段(字符串变量)。

exemptions = ['S-1', '20090820', '\t\t\t\tDOLLAR GENERAL CORP', '\t\t0000029534', 'S-1/A', '20021114', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806', '\t\t\t\tCONSTAR FOREIGN HOLDINGS INC', '\t\t0001178543', '\t\t\t\tCONSTAR PLASTICS LLC', '\t\t0001178541', '\t\t\t\tDT INC', '\t\t0001178539', '\t\t\t\tBFF INC', '\t\t0001178538', '\t\t\t\tCONSTAR INC', '\t\t0001178537', 'S-1', '20020523', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806', 'S-1', '20051123', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300', 'S-1', '20061221', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300', 'S-1/A', '20140327', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729', 'S-1', '20110331', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729', 'S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388', 'S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761', 'S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761', 'S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015', 'S-1', '20140512', '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI', '\t\t0000759944']

I would like to create sublists at the beginning of every 'S-1' or 'S-1/A'. 我想在每个“ S-1”或“ S-1 / A”的开头创建子列表。 Desired output would be: 所需的输出为:

exemptions = [['S-1', '20090820', '\t\t\t\tDOLLAR GENERAL CORP', '\t\t0000029534'], ['S-1/A', '20021114', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806', '\t\t\t\tCONSTAR FOREIGN HOLDINGS INC', '\t\t0001178543', '\t\t\t\tCONSTAR PLASTICS LLC', '\t\t0001178541', '\t\t\t\tDT INC', '\t\t0001178539', '\t\t\t\tBFF INC', '\t\t0001178538', '\t\t\t\tCONSTAR INC', '\t\t0001178537'], ['S-1', '20020523', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806'], ['S-1', '20051123', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'], ['S-1', '20061221', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'], ['S-1/A', '20140327', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'], ['S-1', '20110331', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'], ['S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388'], ['S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'], ['S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'], ['S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015'], ['S-1', '20140512', '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI', '\t\t0000759944']]

I tried _list = [i.split('S-1') for i in exemptions] , but does not give me what I need... 我尝试了_list = [i.split('S-1') for i in exemptions] ,但没有给我我需要的东西...

Any suggestion? 有什么建议吗? Thank you so much 非常感谢

Does this work? 这样行吗?

exemptions = ['S-1', '20090820', .... , '\t\t0000759944']
result = []
for e in exemptions:
    if e in ("S-1", "S-1/A"):
        result.append([])
    result[-1].append(e)

Note that this relies on the fact that your input list starts with a 'starting' S-1 value, each time it encouters one of those it adds a new sublist to the end of result . 请注意,这依赖于您的输入列表以“开始” S-1值开头的事实,每次输入其中一个时,都会在result末尾添加一个新的子列表。 Then all you need to do is keep adding values onto the end of the last sublist. 然后,您要做的就是继续将值添加到最后一个子列表的末尾。

Join the list as a string with custom delimiter, say | 使用自定义定界符将列表作为字符串连接起来,例如| for example, use re.split to split on every occurrence of S-1 and then split each element of the resulting list back to a list based on delimiter | 例如,使用re.split对每次出现的S-1进行拆分,然后将结果列表的每个元素拆分回基于定界符|的列表|

>>> res = [s.strip('|').split('|') for s in re.split(r'(?=S-1)', '|'.join(exemptions)) if s]
>>>
>>> pprint(res)
[['S-1', '20090820', '\t\t\t\tDOLLAR GENERAL CORP', '\t\t0000029534'],
 ['S-1/A',
  '20021114',
  '\t\t\t\tCONSTAR INTERNATIONAL INC',
  '\t\t0000029806',
  '\t\t\t\tCONSTAR FOREIGN HOLDINGS INC',
  '\t\t0001178543',
  '\t\t\t\tCONSTAR PLASTICS LLC',
  '\t\t0001178541',
  '\t\t\t\tDT INC',
  '\t\t0001178539',
  '\t\t\t\tBFF INC',
  '\t\t0001178538',
  '\t\t\t\tCONSTAR INC',
  '\t\t0001178537'],
 ['S-1', '20020523', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806'],
 ['S-1', '20051123', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'],
 ['S-1', '20061221', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'],
 ['S-1/A', '20140327', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'],
 ['S-1', '20110331', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'],
 ['S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388'],
 ['S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'],
 ['S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'],
 ['S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015'],
 ['S-1',
  '20140512',
  '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI',
  '\t\t0000759944']]
>>> 
# exemptions is input list
finalList = []
temporaryList = []
for eachItem in exemptions:
    if 'S-1' in eachItem:
        temporaryList = []
        temporaryList.append(eachItem)
    else:
        temporaryList.append(eachItem)
finalList.append(temporaryList)

print finalList 打印finalList

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM