简体   繁体   English

正则表达式忽略/跳过html标签中的所有内容

[英]RegEx to ignore / skip everything in html tags

Looking for a way to combine two Regular Expressions. 寻找一种组合两个正则表达式的方法。 One to catch the urls and the other to ensure is skips text within html tags. 一种是捕获url,另一种是确保跳过html标签中的文本。 See sample text below functions. 请参见下面的函数示例文本。

Need to pass a block of news text and format text by wrapping urls and email addresses in html tags so users don't have to. 需要通过将网址和电子邮件地址包装在html标签中来传递新闻文本并设置文本格式,以便用户不必这样做。 The below code works great until there are already html tags within the text. 除非文本中已经有html标记,否则下面的代码将非常有用。 In that case it doubles the html tags. 在这种情况下,它将html标记加倍。

There are plenty of examples to strip html, but I want to just ignore it since the url is already linkified. 有很多剥离html的示例,但由于URL已链接,我只想忽略它。 Also - if there is an easier was to accomplish this, with or without Regex, please let me know. 另外,如果有一个简单的方法(使用或不使用Regex)来完成此操作,请告诉我。 none of my attempts to combine Regexs have worked. 我合并Regexs的尝试都没有成功。

coding in ASP.NET VB but will take any workable example/direction. ASP.NET VB中进行编码,但将采用任何可行的示例/方向。

Thanks! 谢谢!

===== Functions ============= =====功能==============

Public Shared Function InsertHyperlinks(ByVal inText As String) As String
    Dim strBuf As String
    Dim objMatches As Object
    Dim iStart, iEnd As Integer
    strBuf = ""
    iStart = 1
    iEnd = 1

    Dim strRegUrlEmail As String = "\b(www|http|\S+@)\S+\b"             
    'RegEx to find urls and email addresses
    Dim objRegExp As New Regex(strRegUrlEmail, RegexOptions.IgnoreCase) 
    'Match URLs and emails        
    Dim MatchList As MatchCollection = objRegExp.Matches(inText)
    If MatchList.Count <> 0 Then

        objMatches = objRegExp.Matches(inText)
        For Each Match In MatchList
            iEnd = Match.Index
            strBuf = strBuf & Mid(inText, iStart, iEnd - iStart + 1)
            If InStr(1, Match.Value, "@") Then
                strBuf = strBuf & HrefGet(Match.Value, "EMAIL", "_BLANK")
            Else
                strBuf = strBuf & HrefGet(Match.Value, "WEB", "_BLANK")
            End If
            iStart = iEnd + Match.Length + 1
        Next
        strBuf = strBuf & Mid(inText, iStart)
        InsertHyperlinks = strBuf
    Else
        'No hyperlinks to replace
        InsertHyperlinks = inText
    End If

End Function

Shared Function HrefGet(ByVal url As String, ByVal urlType As String, ByVal Target As String) As String
    Dim strBuf As String
    strBuf = "<a href="""
    If UCase(urlType) = "WEB" Then
        If LCase(Left(url, 3)) = "www" Then
            strBuf = "<a href=""http://" & url & """ Target=""" & _
                     Target & """>" & url & "</a>"
        Else
            strBuf = "<a href=""" & url & """ Target=""" & _
                    Target & """>" & url & "</a>"
        End If
    ElseIf UCase(urlType) = "EMAIL" Then
        strBuf = "<a href=""mailto:" & url & """ Target=""" & _
                 Target & """>" & url & "</a>"
    End If
    HrefGet = strBuf
End Function

===== Sample Text ============= =====示例文字=============
This would be the inText parameter. 这将是inText参数。

Midway through the ride, we see a <a href="http://www.skipthis.com" target="new">Skip this too</a>. 乘车途中,我们看到了<a href="http://www.skipthis.com" target="new">也跳过这一步</a>。 But sometimes we go here [insert normal www dot link dot com]. 但是有时候我们去这里[插入普通的www点链接点com]。 If you'd like to join us contact Bill Smith at Tester@gmail.com. 如果您想加入我们,请通过Tester@gmail.com与Bill Smith联系。 Thanks! 谢谢!

sorry stack overflow won't allow multiple hyperlinks to be added. 抱歉,堆栈溢出不允许添加多个超链接。

===== End Sample Text ============= =====最终样本文本=============

First, check out this link . 首先, 查看此链接

Then check out the HTML Agility Pack . 然后查看HTML Agility Pack You will save yourself years of headaches by not parsing HTML with regular expressions. 通过不使用正则表达式解析HTML,您将避免数年的头痛。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM