簡體   English   中英

從括號內的字符串中提取分隔列表 RegEx Python

[英]Extract a delimited list from a string inside brackets RegEx Python

我已經看到很多類似的正則表達式問題,但似乎沒有一個可以正確處理我的奇怪情況。 我有一個字符串列表,如下所示:

['[Business Layer~Project Owning Org~Proj Owning Dept ID]', '[Business Layer~Project Owning Org~Proj Owning Org Name]', '[Business Layer~Project~Proj No]', '[Business Layer~Project~Proj Name]', "([Business Layer~Project~Proj No]) || COALESCE((' - ' || ([Business Layer~Project~Proj Name])), ' - ')", '[Project Assignment Fact~Task~Task No]', '[Project Assignment Fact~Task~Task Name]', "([Project Assignment Fact~Task~Task No]) || COALESCE((' - ' || ([Project Assignment Fact~Task~Task Name])), ' - ')", "([Business Layer~Project~Proj No]) || COALESCE((' - ' || ([Project Assignment Fact~Task~Task No])), ' - ') || COALESCE((' - ' || ([Project Assignment Fact~Task~Task Name])), ' - ')", '[Business Layer~Project Cost~Short Code Alias]', '[Business Layer~Expenditure Type~Expenditure Category Name]', '[Business Layer~Expenditure Type~Expenditure Type Parent Code]', '[Business Layer~Expenditure Type~Expend Type Desc]', '[Business Layer~Expenditure Owning Org~Exp Owning Org Name]', '[Business Layer~Transaction Source~Trans Source]', '[Business Layer~Employee~Employee Name]', '[Business Layer~Project Cost~Expend Comment]', '[Business Layer~Project Cost~PO No]', '[Business Layer~Project Cost~PV Invoice No]', '[Business Layer~Vendor~Vendor Name]', '[Business Layer~Scenario~Scenario Name]', '[Business Layer~ERS Employee~ERS Employee Name]', '[Business Layer~ERS Employee~ERS Employee Number]', '[Business Layer~Project Cost~Vehicle Tag No]', '[Business Layer~Project Cost~Vehicle Make]', '[Business Layer~Project Cost~Vehicle Model]', '[Business Layer~Project Cost~Vehicle Mileage]', '[Business Layer~Project Type~Proj Type Code]', '[Business Layer~GL Period~GL Period Start Date]', '[Business Layer~Project Cost~Burdened Cost Amt]']

如您所見,有些字符串非常混亂。 IE:

([Business Layer~Project~Proj No]) || COALESCE((' - ' || ([Business Layer~Project~Proj Name])), ' - ')

我想將括號中的內容提取為列表。 對於上面的混亂,理想的 output 將是一個嵌套列表,如:

[['Business Layer', 'Project', 'Proj No'], ['Business Layer', 'Project', 'Proj Name']]

我從各種類似的問題中嘗試了幾種不同的正則表達式解決方案,但沒有成功。 一些不成功的例子:

# This one is close, but only accounts for 1 list
for i in test:
    result = re.findall("([^(~)]+)(?!.*\()+", i)
    print(result)


# Yields a blank list AND more importantly, some of these are longer than 3.
for i in test:
    result = re.findall("(\[.*?\]\~\[.*?\]\~\[.*?\])", i)
    print(result)


# This captures the beginning but not the end

^\[([^~]+)

# This essentially captures everything but what I want

[^~]+(?=\[.*?\]*$)

請讓我知道你的想法。 我對正則表達式感到困惑

我的 2 美分:

list(map(lambda y: [x.split('~') for x in re.findall(r'\[([^\].*\[]*)\]', y)], all_strings))

其中all_strings是問題中的字符串列表加上'["if ([Business Layer~Scenario~Scenario Name] = ''Budget 2013'' and [Business Layer~GL Period~GL Year Number] = 2013) Then ([Business Layer~GL Balances~Period Net DR Amt]-[Business Layer~GL Balances~Period Net CR Amt]) else (0)", ''if'']' .

這里是all_strings中每個字符串的結果:

[Business Layer~Project Owning Org~Proj Owning Dept ID] --> [['Business Layer', 'Project Owning Org', 'Proj Owning Dept ID']]
[Business Layer~Project Owning Org~Proj Owning Org Name] --> [['Business Layer', 'Project Owning Org', 'Proj Owning Org Name']]
[Business Layer~Project~Proj No] --> [['Business Layer', 'Project', 'Proj No']]
[Business Layer~Project~Proj Name] --> [['Business Layer', 'Project', 'Proj Name']]
([Business Layer~Project~Proj No]) || COALESCE((' - ' || ([Business Layer~Project~Proj Name])), ' - ') --> [['Business Layer', 'Project', 'Proj No'], ['Business Layer', 'Project', 'Proj Name']]
[Project Assignment Fact~Task~Task No] --> [['Project Assignment Fact', 'Task', 'Task No']]
[Project Assignment Fact~Task~Task Name] --> [['Project Assignment Fact', 'Task', 'Task Name']]
([Project Assignment Fact~Task~Task No]) || COALESCE((' - ' || ([Project Assignment Fact~Task~Task Name])), ' - ') --> [['Project Assignment Fact', 'Task', 'Task No'], ['Project Assignment Fact', 'Task', 'Task Name']]
([Business Layer~Project~Proj No]) || COALESCE((' - ' || ([Project Assignment Fact~Task~Task No])), ' - ') || COALESCE((' - ' || ([Project Assignment Fact~Task~Task Name])), ' - ') --> [['Business Layer', 'Project', 'Proj No'], ['Project Assignment Fact', 'Task', 'Task No'], ['Project Assignment Fact', 'Task', 'Task Name']]
[Business Layer~Project Cost~Short Code Alias] --> [['Business Layer', 'Project Cost', 'Short Code Alias']]
[Business Layer~Expenditure Type~Expenditure Category Name] --> [['Business Layer', 'Expenditure Type', 'Expenditure Category Name']]
[Business Layer~Expenditure Type~Expenditure Type Parent Code] --> [['Business Layer', 'Expenditure Type', 'Expenditure Type Parent Code']]
[Business Layer~Expenditure Type~Expend Type Desc] --> [['Business Layer', 'Expenditure Type', 'Expend Type Desc']]
[Business Layer~Expenditure Owning Org~Exp Owning Org Name] --> [['Business Layer', 'Expenditure Owning Org', 'Exp Owning Org Name']]
[Business Layer~Transaction Source~Trans Source] --> [['Business Layer', 'Transaction Source', 'Trans Source']]
[Business Layer~Employee~Employee Name] --> [['Business Layer', 'Employee', 'Employee Name']]
[Business Layer~Project Cost~Expend Comment] --> [['Business Layer', 'Project Cost', 'Expend Comment']]
[Business Layer~Project Cost~PO No] --> [['Business Layer', 'Project Cost', 'PO No']]
[Business Layer~Project Cost~PV Invoice No] --> [['Business Layer', 'Project Cost', 'PV Invoice No']]
[Business Layer~Vendor~Vendor Name] --> [['Business Layer', 'Vendor', 'Vendor Name']]
[Business Layer~Scenario~Scenario Name] --> [['Business Layer', 'Scenario', 'Scenario Name']]
[Business Layer~ERS Employee~ERS Employee Name] --> [['Business Layer', 'ERS Employee', 'ERS Employee Name']]
[Business Layer~ERS Employee~ERS Employee Number] --> [['Business Layer', 'ERS Employee', 'ERS Employee Number']]
[Business Layer~Project Cost~Vehicle Tag No] --> [['Business Layer', 'Project Cost', 'Vehicle Tag No']]
[Business Layer~Project Cost~Vehicle Make] --> [['Business Layer', 'Project Cost', 'Vehicle Make']]
[Business Layer~Project Cost~Vehicle Model] --> [['Business Layer', 'Project Cost', 'Vehicle Model']]
[Business Layer~Project Cost~Vehicle Mileage] --> [['Business Layer', 'Project Cost', 'Vehicle Mileage']]
[Business Layer~Project Type~Proj Type Code] --> [['Business Layer', 'Project Type', 'Proj Type Code']]
[Business Layer~GL Period~GL Period Start Date] --> [['Business Layer', 'GL Period', 'GL Period Start Date']]
[Business Layer~Project Cost~Burdened Cost Amt] --> [['Business Layer', 'Project Cost', 'Burdened Cost Amt']]
["if ([Business Layer~Scenario~Scenario Name] = Budget 2013 and [Business Layer~GL Period~GL Year Number] = 2013) Then ([Business Layer~GL Balances~Period Net DR Amt]-[Business Layer~GL Balances~Period Net CR Amt]) else (0)", if] --> [['Business Layer', 'Scenario', 'Scenario Name'], ['Business Layer', 'GL Period', 'GL Year Number'], ['Business Layer', 'GL Balances', 'Period Net DR Amt'], ['Business Layer', 'GL Balances', 'Period Net CR Amt']]

我會嘗試一些不同的東西 - 只需搜索字符aZ和空格。

如果string_list是您的問題列表:

import re

for s in string_list:
    print(re.findall(r"[A-Z][\sa-zA-Z]*", s))

印刷:

['Business Layer', 'Project Owning Org', 'Proj Owning Dept ID']
['Business Layer', 'Project Owning Org', 'Proj Owning Org Name']
['Business Layer', 'Project', 'Proj No']
['Business Layer', 'Project', 'Proj Name']
['Business Layer', 'Project', 'Proj No', 'COALESCE', 'Business Layer', 'Project', 'Proj Name']
['Project Assignment Fact', 'Task', 'Task No']
['Project Assignment Fact', 'Task', 'Task Name']
['Project Assignment Fact', 'Task', 'Task No', 'COALESCE', 'Project Assignment Fact', 'Task', 'Task Name']
['Business Layer', 'Project', 'Proj No', 'COALESCE', 'Project Assignment Fact', 'Task', 'Task No', 'COALESCE', 'Project Assignment Fact', 'Task', 'Task Name']
['Business Layer', 'Project Cost', 'Short Code Alias']
['Business Layer', 'Expenditure Type', 'Expenditure Category Name']
['Business Layer', 'Expenditure Type', 'Expenditure Type Parent Code']
['Business Layer', 'Expenditure Type', 'Expend Type Desc']
['Business Layer', 'Expenditure Owning Org', 'Exp Owning Org Name']
['Business Layer', 'Transaction Source', 'Trans Source']
['Business Layer', 'Employee', 'Employee Name']
['Business Layer', 'Project Cost', 'Expend Comment']
['Business Layer', 'Project Cost', 'PO No']
['Business Layer', 'Project Cost', 'PV Invoice No']
['Business Layer', 'Vendor', 'Vendor Name']
['Business Layer', 'Scenario', 'Scenario Name']
['Business Layer', 'ERS Employee', 'ERS Employee Name']
['Business Layer', 'ERS Employee', 'ERS Employee Number']
['Business Layer', 'Project Cost', 'Vehicle Tag No']
['Business Layer', 'Project Cost', 'Vehicle Make']
['Business Layer', 'Project Cost', 'Vehicle Model']
['Business Layer', 'Project Cost', 'Vehicle Mileage']
['Business Layer', 'Project Type', 'Proj Type Code']
['Business Layer', 'GL Period', 'GL Period Start Date']
['Business Layer', 'Project Cost', 'Burdened Cost Amt']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM