[英]How to make Python REGEX match first occurrence of the expression - it extends to the second now
I have to following text: 我必须遵循以下文字:
Signatures 35 2 Table of Contents Part I. Financial Information Item 1. Financial
Statements Noble Midstream Partners LP Consolidated Statements of Operations and Comprehensive Income (in thousands except per unit amounts unaudited) Three Months Ended March 31 2018 2017 Revenues Midstream Services - Affiliate 64263
50314 Midstream Services - Net Income Attributable
to Limited Partners Per Limited Partner Unit - Basic and Diluted Common Units
0.97 0.77 Subordinated Units 0.97 0.77 Weighted Average Limited Partner Units
Outstanding - Basic Common Units 23683 15903 Subordinated Units 15903 15903
Weighted Average Limited Partner Units Outstanding - Diluted Common Units 23698
15909 Subordinated Units 15903 15903 The accompanying notes are an integral part
of these financial statements. 3 Table of Contents Noble Midstream Partners LP
758 The accompanying notes are an integral part of these financial statements.
4 Table of Contents Noble Midstream Partners LP Consolidated Statements of Cash
Flows (in thousands unaudited) Three Months Ended March 31 2018 2017 Cash Flows
From Operating Activities Net Income 39136 34520 Adjustments to Reconcile Net
Income to Net Cash Provided by Operating Activities Depreciation and
Amortization 11329 2449 Dividends from Equity Method Investee Net of Income 393 0
Unit-Based Compensation 321 127 Other Adjustments for Noncash Items Included in
Income 167 95 Changes in Operating Assets and Liabilities Net of Assets Acquired
and Liabilities Assumed Increase in Accounts Receivable (2520) (3322) Decrease
in Accounts Payable (836) (2518) Other Operating Assets and Liabilities Net
(2387) 874 Net Cash Provided by Operating Activities 45603 32225 Cash Flows
From Investing Activities Additions to Property Plant and Equipment (161509)
(32298) Black Diamond Acquisition Net of Cash Acquired (650131) 0 Additions to
Investments 0 (414) Distributions from Cost Method Investee 419 123 Net Cash
Used in Investing Activities (811221) (32589) Cash Flows From Financing
Activities Distributions to Noncontrolling Interests (3007) (11267) Contributions
from Noncontrolling Interests 409865 7087 Borrowings Under Revolving Credit
Facility 405000 0 Repayment of Revolving Credit Facility (55000) 0 Distributions
to Unitholders (19860) (13782) Revolving Credit Facility Amendment Fees and
Other (1987) (236) Net Cash Provided by (Used in) Financing Activities 735011
(18198) Decrease in Cash Cash Equivalents and Restricted Cash (30607) (18562)
Cash Cash Equivalents and Restricted Cash at Beginning of Period 55531 57421
Cash Cash Equivalents and Restricted Cash at End of Period 24924 38859 The
accompanying notes are an integral part of these financial statements. 5 Table
of Contents Noble Midstream Partners LP Consolidated Statement of Changes in
Equity (in thousands unaudited) Partnership Common Units Subordinated Units
General Partner Noncontrolling Interests
I need to extract text after words Subordinated units
with four numbers that follow this combination of words and until first
Cash Flow
. 我需要的话后提取文本
Subordinated units
与后面的话,直到第一个这样的组合四个数字Cash Flow
。 I have constructed the following Regex: 我构建了以下正则表达式:
CONSOLIDATED STATEMENTS? OF OPERATIONS?.+?\sSubordinated units.+?\s(\(?\d*[.]?(\d+)?\)?\s\(?\d*[.]?(\d+)?\)?\s\(?\d*[.]?(\d+)?\)?\s\(?\d*[.]?(\d+)?\)?)
This regex should not find any match as there are only two numbers after expression Subordinated units
. 这个正则表达式不应该找到任何匹配,因为表达式
Subordinated units
后只有两个数字。 However, it manages to match till the end of this Noble Midstream Partners LP Consolidated Statements of Cash Flows (in thousands unaudited) Three Months Ended March 31 2018 2017
which has three numbers, and is second occurrence of Cash Flow
. 然而,它设法匹配至此
Noble Midstream Partners LP Consolidated Statements of Cash Flows (in thousands unaudited) Three Months Ended March 31 2018 2017
的结尾Noble Midstream Partners LP Consolidated Statements of Cash Flows (in thousands unaudited) Three Months Ended March 31 2018 2017
有三个数字,并且是Cash Flow
第二次出现。 How do I make sure that it catches only exact four numbers and does not extend to the second Cash Flow
? 我如何确保它只捕获确切的四个数字并且不会扩展到第二个
Cash Flow
?
I think this regex might solve your problem. 我认为这个正则表达式可以解决你的问题。 It searches until the first
Cash Flows
. 它会搜索到第一个
Cash Flows
。
It uses the (?s)
modifier to let the dot .
它使用
(?s)
修饰符来设置dot .
match newlines. 匹配换行符。 Think of
s
in this case as string rather than matching a line . 在这种情况下,将
s
视为字符串而不是匹配行 。
At first, I was capturing up to the second Cash Flows
, but I noticed that the first occurrence had a newline between Cash and Flows . 起初,我正在捕捉第二个
Cash Flows
,但我注意到第一次出现了Cash和Flows之间的换行符。 To correct for this, I wrote Cash\\s+Flows
where the 2 words were separated by space (could be a regular space or a newline which is also a space character). 为了解决这个问题,我写了
Cash\\s+Flows
,其中2个单词用空格分隔(可以是常规空格或换行也是空格字符)。
import re
fin = open('cash_flow.txt', 'r')
text = fin.read()
p = re.compile(r'(?s)(Consolidated Statements of Operations.+?Cash\s+Flows)')
m = p.search(text)
print(m.group(1))
The print out I got was: 我得到的打印出来是:
Consolidated Statements of Operations and Comprehensive Income (in thousands except per unit amounts unaudited) Three Months Ended March 31 2018 2017 Revenues Midstream Services - Affiliate 64263
50314 Midstream Services - Third Party 11360 0 Crude Oil Sales - Third Party
22110 0 Total Revenues 97733 50314 Costs and Expenses Cost of Crude Oil Sales
21439 0 Direct Operating 17148 11401 Depreciation and Amortization 11329 2449
General and Administrative 10442 2742 Total Operating Expenses 60358 16592
Operating Income 37375 33722 Other (Income) Expense Interest Expense Net of
Amount Capitalized 1033 267 Investment Income (2868) (1065) Total Other Income
(1835) (798) Income Before Income Taxes 39210 34520 Income Tax Provision 74 0
Net Income 39136 34520 Less: Net (Loss) Income Attributable to Noncontrolling
Interests (225) 10178 Net Income Attributable to Noble Midstream Partners LP
39361 24342 Less: Net Income Attributable to Incentive Distribution Rights 819 0
Net Income Attributable to Limited Partners 38542 24342 Net Income Attributable
to Limited Partners Per Limited Partner Unit - Basic and Diluted Common Units
0.97 0.77 Subordinated Units 0.97 0.77 Weighted Average Limited Partner Units
Outstanding - Basic Common Units 23683 15903 Subordinated Units 15903 15903
Weighted Average Limited Partner Units Outstanding - Diluted Common Units 23698
15909 Subordinated Units 15903 15903 The accompanying notes are an integral part
of these financial statements. 3 Table of Contents Noble Midstream Partners LP
758 The accompanying notes are an integral part of these financial statements.
4 Table of Contents Noble Midstream Partners LP Consolidated Statements of Cash
Flows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.