![](/img/trans.png)
[英]how to parse a string with html tags in its substrings which are bold, italic, underlined
[英]How to retrieve bold, italic and underlined words from plain text and surround them by HTML tags
我想要實現的目標:
輸入:(輸入文本來自Excel單元格)
這是一個字符串,包括粗體 , 斜體和帶下划線的單詞。
預期產量:
This is a <b>string</b> includes <b>bold</b>, <i>italic</i> and <u>underlined</u> words.
我嘗試了什么:(此方法通過字符而不是單詞來迭代純文本。)
StringBuilder html = new StringBuilder();
StringBuilder fontText = new StringBuilder();
string path = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Test.xls");
Application excel = new Application();
Workbook wb = excel.Workbooks.Open(path);
Worksheet excelSheet = wb.ActiveSheet;
//Read the first cell
Range cell = excelSheet.Cells[1, 1];
for (int index = 1; index <= cell.Text.ToString().Length; index++)
{
//cell here is a Range object
Characters ch = cell.get_Characters(index, 1);
bool bold = (bool) ch.Font.Bold;
if(bold){
if (html.Length == 0)
html.Append("<b>");
html.Append(ch.Text);
}
}
if (html.Length !=0) html.Append("</b>")
但是此方法返回由HTML標記包圍的所有粗體文本,如<b>stringbold</b>
預期結果為: <b>string</b>
和<b>bold</b>
對此有何好的想法?
提前致謝。
這就是我要做的事情:
創建一個幫助類,它知道字體樣式及其開始和結束標記,並且可以跟蹤“當前”字體樣式
使用Regular樣式開始該類,然后在循環中,如果在寫入當前字符之前字體樣式已更改,請求幫助程序類插入開始和結束標記
在循環結束時,請求助手插入正確的結束標記
我沒有可以使用的Excel互操作項目,所以這是一個示例,您可能必須適應特定的Excel字體類型。
一,助手類:
static class TextHelper
{
// You may have to use a different type than `FontStyle`
// Hopefully ch.Font has some type of `Style` property you can use
public static FontStyle CurrentStyle { get; set; }
public static string OpenTag { get { return GetOpenTag(); } }
public static string CloseTag { get { return GetCloseTag(); } }
// This will return the closing tag for the current font style,
// followed by the opening tag for the new font style
public static string ChangeStyleIfNeeded(FontStyle newStyle)
{
if (newStyle == CurrentStyle) return string.Empty;
var transitionStyleTags = GetCloseTag();
CurrentStyle = newStyle;
transitionStyleTags += GetOpenTag();
return transitionStyleTags;
}
private static string GetOpenTag()
{
switch (CurrentStyle)
{
case FontStyle.Bold:
return "<b>";
case FontStyle.Italic:
return "<i>";
case FontStyle.Underline:
return "<u>";
default:
return "";
}
}
private static string GetCloseTag()
{
switch (CurrentStyle)
{
case FontStyle.Bold:
return "</b>";
case FontStyle.Italic:
return "</i>";
case FontStyle.Underline:
return "</u>";
default:
return "";
}
}
}
接下來,實現看起來像這樣:
// Start our helper class with 'Regular' font
TextHelper.CurrentStyle = FontStyle.Regular;
var html = new StringBuilder();
for (int index = 1; index <= cell.Text.ToString().Length; index++)
{
char ch = cell.get_Characters(index, 1);
// If the Font of this character is different than the current font,
// this will close the old style and open our new style.
html.Append(TextHelper.ChangeStyleIfNeeded(ch.Font));
// Append this character
html.Append(ch.Text);
}
// Close the style at the very end
html.Append(TextHelper.CloseTag);
我花了一半的時間來弄清楚這個解決方案。
1.代碼適用於粗體 , 斜體和下划線字符。
這個算法有點復雜。 如果有任何優化或任何人提出更好的解決方案,請發布新的答案。
ExcelReader
方法:
public string ExcelReader(string excelFilePath)
{
StringBuilder resultText = new StringBuilder();
//string excelFilePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Test.xls");
Application excel = new Application();
Workbook wb = excel.Workbooks.Open(excelFilePath);
Worksheet excelSheet = wb.ActiveSheet;
//Read the first cell
Range cell = excelSheet.Cells[1, 1];
//Check if one bold or italic WORD.
bool IfStop = false;
//Check if character is the start of bold or italic character.
bool ifFirstSpecialCharacter = true;
//Initialize a empty tag
string tag = "";
//Check if it is the last index
bool isLastIndex = false;
for (int index = 1; index <= cell.Text.ToString().Length; index++)
{
//Check if the current character is bold or italic
bool IfSpecialType = false;
//cell here is a Range object
Characters ch = cell.get_Characters(index, 1);
XlUnderlineStyle temp = (XlUnderlineStyle)ch.Font.Underline;
bool underline = false;
if (temp == XlUnderlineStyle.xlUnderlineStyleSingle)
underline = true;
bool bold = (bool)ch.Font.Bold;
bool italic = (bool)ch.Font.Italic;
if (underline)
{
if (tag != "" && tag != "<u>")
{
resultText.Append(tag.Insert(1, "/"));
ifFirstSpecialCharacter = true;
IfStop = true;
}
tag = "<u>";
IfSpecialType = true;
}
if (bold)
{
if (tag != "" && tag != "<b>")
{
resultText.Append(tag.Insert(1, "/"));
ifFirstSpecialCharacter = true;
IfStop = true;
}
tag = "<b>";
IfSpecialType = true;
}
if (italic)
{
if (tag != "" && tag != "<i>")
{
resultText.Append(tag.Insert(1, "/"));
ifFirstSpecialCharacter = true;
IfStop = true;
}
tag = "<i>";
IfSpecialType = true;
}
if (index == cell.Text.ToString().Length)
isLastIndex = true;
DetectSpecialCharracterByType(isLastIndex, resultText, ref tag, IfSpecialType, ref IfStop, ref ifFirstSpecialCharacter, ch);
}
wb.Close();
return resultText.ToString();
}
DetectSpecialCharacterByType
方法:
private static void DetectSpecialCharacterByType(bool isLastIndex, StringBuilder fontText, ref string tag, bool ifSpecialType, ref bool IfStop, ref bool ifFirstSpecialCharacter, Characters ch)
{
if (ifSpecialType)
{
//If it is the first character of the word, put the <b> or <i> at the beginning.
if (ifFirstSpecialCharacter)
{
fontText.Append(tag);
ifFirstSpecialCharacter = false;
IfStop = false;
}
//This is a edge case.If the last word of the text is bold or italic, put the </b> or </i>
if (isLastIndex)
{
fontText.Append(ch.Text);
fontText.Append(tag.Insert(1, "/"));
}
else
fontText.Append(ch.Text);
}
else
{
//If it is the last character of one word, add </b> or </i> at the end.
if (!IfStop && tag != "")
{
fontText.Append(tag.Insert(1, "/"));
IfStop = true;
ifFirstSpecialCharacter = true;
tag = "";
}
fontText.Append(ch.Text);
}
}
代碼完美地通過復制粘貼和添加新的引用
Microsoft.Office.Interop.Excel
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.