[英]vb.net Function for Percentage of Html
I have some articles saved in database.On certain pages I wanted to show certain percentage of the article based on some settings. 我有一些文章保存在数据库中。在某些页面上,我想根据某些设置显示文章的某些百分比。 eg 80% of the article
例如,文章的80%
Problem is that as html is not plain text if I take certain percentage of the string length then formatting get disturbed Can any help me in some function where I provide string and new length (which will be less then the old string length) And it will return me truncated html without disturbing the formating I have tried with 问题是,如果我采用一定百分比的字符串长度,则html不是纯文本,那么格式就会受到干扰。在我提供字符串和新长度(将小于旧字符串长度)的某些函数中,有什么可以帮助我的吗?返回我截断的html而不干扰我尝试过的格式
Private Function HtmlSubstring(html As String, maxlength As Integer) As String
'initialize regular expressions
Dim htmltag As String = "</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>"
Dim emptytags As String = "<(\w+)((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?></\1>"
'match all html start and end tags, otherwise get each character one by one..
Dim expression As Regex = New Regex(String.Format("({0})|(.?)", htmltag))
Dim matches As MatchCollection = expression.Matches(html)
Dim i As Integer = 0
Dim content As New StringBuilder()
For Each match As Match In matches
If match.Value.Length = 1 AndAlso i < maxlength Then
content.Append(match.Value)
i += 1
'the match contains a tag
ElseIf match.Value.Length > 1 Then
content.Append(match.Value)
End If
Next
Return Regex.Replace(content.ToString(), emptytags, String.Empty)
End Function
But didn't work always 但是并不总是有效
I'm pretty sure that there is no built-in .NET method to do what you ask. 我很确定没有内置的.NET方法可以执行您所要求的操作。 However, consider the following method:
但是,请考虑以下方法:
Your HTML page is probably structured, ie, it has paragraphs, headings, etc.: 您的HTML页面可能是结构化的,即它具有段落,标题等:
<h1>...</h1>
<p>...</p>
<h2>...</h2>
<p>...<more tags>...</more tags></p>
<h2>...</h2>
<p>...</p>
...
What you could do is: 您可以做的是:
Take the first 80% of the top-level tags . 选取顶层标签的前80%。 For example, if the root node of your HTML content has ten children, take the first eight:
例如,如果HTML内容的根节点有十个子节点,则取前八个:
<h1>...</h1> <p>...</p> <p>...</p> <h2>...</h2> <p> ... <more tags> ... </more tags> ... </p> <p>...</p> <p>...<more tags>...</more tags>...</p> <p>...</p> --------------- <h2>...</h2> <p>...</p>
If your article is approximately evenly spaced (ie, your long and short paragraphs average out over the course of the article), this will give you approximately 80% of the text without breaking any HTML formatting. 如果您的文章间距大致均匀(即,您的长篇和短篇文章在整个文章过程中平均),这将为您提供大约 80%的文本,而不会破坏任何HTML格式。 As an additional benefit, you won't be splitting the text mid-line or mid-paragraph.
另外一个好处是,您不会在中间行或中间段拆分文本。
Finally following has work quite well for me 最后跟随对我来说很好
Private Function HtmlSubstring(ByRef html As String, maxlength As Integer) As String
'initialize regular expressions
Const htmltag As String = "</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>"
'match all html start and end tags, otherwise get each character one by one..
Dim expression As Regex = New Regex(String.Format("({0})|(.?)", htmltag))
Dim matches As MatchCollection = expression.Matches(html)
Dim i As Integer = 0
Dim isEndingSet As Boolean = False
Dim content As StringBuilder = New StringBuilder()
For Each match As Match In matches
If match.Value.Length = 1 AndAlso i < maxlength Then
content.Append(match.Value)
'the match contains a tag
i += 1
ElseIf match.Value.Length > 1 Then
If (isEndingSet AndAlso (match.Value.ToLower() = "<br />" OrElse match.Value.ToLower() = "<br>")) Then
Continue For
End If
content.Append(match.Value)
End If
If (i = maxlength AndAlso Not isEndingSet) Then
content.Append("....")
isEndingSet = True
End If
Next
Return content.ToString()
End Function
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.