简体   繁体   English

如何使用正则表达式从文本中提取动态字符串

[英]How to extract dynamic string from a text using regular expression

I am working on the below sample text我正在处理以下示例文本

  1. text here text here —(1) Without prejudice to any special rights previously conferred on the holders of any existing shares or class of shares but subject to the Act, shares in the company may be issued by the directors. text here text here —(1) 在不损害先前授予任何现有股份或股份类别持有人的任何特殊权利的情况下,但受该法的约束,公司的股份可由董事发行。 (2) Shares referred to in paragraph (1) may be issued with preferred, deferred, or other special rights or restrictions, whether in regard to dividend, voting, return of capital, or otherwise, as the directors, subject to any ordinary resolution of the company, determine. (2) 第 (1) 款所述的股份可以作为董事发行优先权、递延权或其他特殊权利或限制,无论是在股息、投票、资本返还或其他方面,但须遵守任何普通决议公司,确定。

I want to split the text in the following manner我想按以下方式拆分文本

  1. text here text here —此处发文 此处发文 —

(1) Without prejudice to any special rights previously conferred on the holders of any existing shares or class of shares but subject to the Act, shares in the company may be issued by the directors. (1) 在不损害先前授予任何现有股份或股份类别持有人但受该法约束的任何特殊权利的情况下,公司股份可由董事发行。

(2) Shares referred to in paragraph (1) may be issued with preferred, deferred, or other special rights or restrictions, whether in regard to dividend, voting, return of capital, or otherwise, as the directors, subject to any ordinary resolution of the company, determine. (2) 第 (1) 款所述的股份可以作为董事发行优先权、递延权或其他特殊权利或限制,无论是在股息、投票、资本返还或其他方面,但须遵守任何普通决议公司,确定。

The text is dynamic in nature and instead of (1) , (2) we could get (a) , (b) , a.文本本质上是动态的,而不是(1) , (2)我们可以得到(a) , (b) , a. , b. , b. , i , ii , iii . , i , ii , iii . To handle the first problem statement, I have used the below regular expression in VBA:为了处理第一个问题语句,我在 VBA 中使用了以下正则表达式:

Pattern = "([(][\d][)])([A-Z,a-z,.,\-,’,(,),_, ,:,\n,“,”,"",:,;,—,-,\—,\t,\r,]*)"

I am looking for a solution to split the contents but not looking for a solution specific in VBA or regular expressions.我正在寻找一种拆分内容的解决方案,但不寻找特定于 VBA 或正则表达式的解决方案。 Any other approach is also appreciated.任何其他方法也值得赞赏。

You did not answer the clarification questions, but please, try the next VBA function:您没有回答澄清问题,但请尝试下一个 VBA 函数:

Function SpecSplitText(x As String) As String
  Dim strFin As String, strInt As String, arr, arr1, El
  
  arr = Split(x, "—"):
  strFin = arr(0) & vbCrLf
  arr1 = Split(arr(1), ".")
  
  For Each El In arr1
    If Len(LTrim(El)) = 1 Then
        strInt = LTrim(El) & ". "
    Else
        strFin = strFin & strInt & LTrim(El) & "." & vbCrLf
        strInt = ""
    End If
  Next
  strFin = left(strFin, Len(strFin) - 3)
  SpecSplitText = strFin
End Function

You can test the function using the next simple test Sub :您可以使用下一个简单的测试Sub来测试该功能:

Sub testSpecSplitText()
    Dim x As String, y As String, z As String
    x = "7. text here text here —(1) Without prejudice to any special rights previously conferred on the holders of any existing shares or class of shares but subject to the Act, shares in the company may be issued by the directors. (2) Shares referred to in paragraph (1) may be issued with preferred, deferred, or other special rights or restrictions, whether in regard to dividend, voting, return of capital, or otherwise, as the directors, subject to any ordinary resolution of the company, determine."
    y = "8. text here text here —a. Without prejudice to any special rights previously conferred on the holders of any existing shares or class of shares but subject to the Act, shares in the company may be issued by the directors. b. Shares referred to in paragraph (1) may be issued with preferred, deferred, or other special rights or restrictions, whether in regard to dividend, voting, return of capital, or otherwise, as the directors, subject to any ordinary resolution of the company, determine."
    z = "9. text here text here —I Without prejudice to any special rights previously conferred on the holders of any existing shares or class of shares but subject to the Act, shares in the company may be issued by the directors. II Shares referred to in paragraph (1) may be issued with preferred, deferred, or other special rights or restrictions, whether in regard to dividend, voting, return of capital, or otherwise, as the directors, subject to any ordinary resolution of the company, determine."
    
    Debug.Print SpecSplitText(x)
    Debug.Print SpecSplitText(y)
    Debug.Print SpecSplitText(z)
End Sub

But, if the text in discussion uses "etc."但是,如果讨论中的文本使用“等”。 or an abbreviation ending in dot (.), you must enumerate them to escape.或以点 (.) 结尾的缩写,您必须枚举它们以进行转义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM