简体   繁体   English

vb.net函数的HTML百分比

[英]vb.net Function for Percentage of Html

I have some articles saved in database.On certain pages I wanted to show certain percentage of the article based on some settings. 我有一些文章保存在数据库中。在某些页面上,我想根据某些设置显示文章的某些百分比。 eg 80% of the article 例如,文章的80%

Problem is that as html is not plain text if I take certain percentage of the string length then formatting get disturbed Can any help me in some function where I provide string and new length (which will be less then the old string length) And it will return me truncated html without disturbing the formating I have tried with 问题是,如果我采用一定百分比的字符串长度,则html不是纯文本,那么格式就会受到干扰。在我提供字符串和新长度(将小于旧字符串长度)的某些函数中,有什么可以帮助我的吗?返回我截断的html而不干扰我尝试过的格式

Private Function HtmlSubstring(html As String, maxlength As Integer) As String
        'initialize regular expressions
        Dim htmltag As String = "</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>"
        Dim emptytags As String = "<(\w+)((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?></\1>"

        'match all html start and end tags, otherwise get each character one by one..
        Dim expression As Regex = New Regex(String.Format("({0})|(.?)", htmltag))
        Dim matches As MatchCollection = expression.Matches(html)

        Dim i As Integer = 0
        Dim content As New StringBuilder()
        For Each match As Match In matches
            If match.Value.Length = 1 AndAlso i < maxlength Then
                content.Append(match.Value)
                i += 1
                'the match contains a tag
            ElseIf match.Value.Length > 1 Then
                content.Append(match.Value)
            End If
        Next

        Return Regex.Replace(content.ToString(), emptytags, String.Empty)
    End Function

But didn't work always 但是并不总是有效

I'm pretty sure that there is no built-in .NET method to do what you ask. 我很确定没有内置的.NET方法可以执行您所要求的操作。 However, consider the following method: 但是,请考虑以下方法:

Your HTML page is probably structured, ie, it has paragraphs, headings, etc.: 您的HTML页面可能是结构化的,即它具有段落,标题等:

<h1>...</h1>
<p>...</p>
<h2>...</h2>
<p>...<more tags>...</more tags></p>
<h2>...</h2>
<p>...</p>
...

What you could do is: 您可以做的是:

  1. Use a HTML parser (the HTML agility pack is often mentioned in this context) and parse your HTML into a data structure. 使用HTML解析器(在此上下文中经常提到HTML敏捷性包 )并将HTML解析为数据结构。
  2. Take the first 80% of the top-level tags . 选取顶层标签的前80%。 For example, if the root node of your HTML content has ten children, take the first eight: 例如,如果HTML内容的根节点有十个子节点,则取前八个:

     <h1>...</h1> <p>...</p> <p>...</p> <h2>...</h2> <p> ... <more tags> ... </more tags> ... </p> <p>...</p> <p>...<more tags>...</more tags>...</p> <p>...</p> --------------- <h2>...</h2> <p>...</p> 

If your article is approximately evenly spaced (ie, your long and short paragraphs average out over the course of the article), this will give you approximately 80% of the text without breaking any HTML formatting. 如果您的文章间距大致均匀(即,您的长篇和短篇文章在整个文章过程中平均),这将为您提供大约 80%的文本,而不会破坏任何HTML格式。 As an additional benefit, you won't be splitting the text mid-line or mid-paragraph. 另外一个好处是,您不会在中间行或中间段拆分文本。

Finally following has work quite well for me 最后跟随对我来说很好

 Private Function HtmlSubstring(ByRef html As String, maxlength As Integer) As String
    'initialize regular expressions
    Const htmltag As String = "</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>"
    'match all html start and end tags, otherwise get each character one by one..
    Dim expression As Regex = New Regex(String.Format("({0})|(.?)", htmltag))
    Dim matches As MatchCollection = expression.Matches(html)
    Dim i As Integer = 0
    Dim isEndingSet As Boolean = False
    Dim content As StringBuilder = New StringBuilder()
    For Each match As Match In matches
        If match.Value.Length = 1 AndAlso i < maxlength Then
            content.Append(match.Value)
            'the match contains a tag
            i += 1
        ElseIf match.Value.Length > 1 Then
            If (isEndingSet AndAlso (match.Value.ToLower() = "<br />" OrElse match.Value.ToLower() = "<br>")) Then
                Continue For
            End If
            content.Append(match.Value)
        End If
        If (i = maxlength AndAlso Not isEndingSet) Then
            content.Append("....")
            isEndingSet = True
        End If
    Next

    Return content.ToString()
End Function

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM