簡體   English   中英

如何從純文本中檢索粗體,斜體和帶下划線的單詞,並用HTML標記將它們包圍起來

[英]How to retrieve bold, italic and underlined words from plain text and surround them by HTML tags

我想要實現的目標:

輸入:(輸入文本來自Excel單元格)

這是一個字符串,包括粗體斜體和帶下划線的單詞。

預期產量:

This is a <b>string</b> includes <b>bold</b>, <i>italic</i> and <u>underlined</u> words.

我嘗試了什么:(此方法通過字符而不是單詞來迭代純文本。)

        StringBuilder html = new StringBuilder();
        StringBuilder fontText = new StringBuilder();
        string path = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Test.xls");
        Application excel = new Application();
        Workbook wb = excel.Workbooks.Open(path);
        Worksheet excelSheet = wb.ActiveSheet;
        //Read the first cell
        Range cell = excelSheet.Cells[1, 1];
          for (int index = 1; index <= cell.Text.ToString().Length; index++)
            {
              //cell here is a Range object
              Characters ch = cell.get_Characters(index, 1);
              bool bold = (bool) ch.Font.Bold;
              if(bold){
                 if (html.Length == 0)
                      html.Append("<b>");
                 html.Append(ch.Text);
           }
      }
      if (html.Length !=0) html.Append("</b>")

但是此方法返回由HTML標記包圍的所有粗體文本,如<b>stringbold</b>

預期結果為: <b>string</b><b>bold</b>

對此有何好的想法?

提前致謝。

這就是我要做的事情:

  1. 創建一個幫助類,它知道字體樣式及其開始和結束標記,並且可以跟蹤“當前”字體樣式

  2. 使用Regular樣式開始該類,然后在循環中,如果在寫入當前字符之前字體樣式已更改,請求幫助程序類插入開始和結束標記

  3. 在循環結束時,請求助手插入正確的結束標記

我沒有可以使用的Excel互操作項目,所以這是一個示例,您可能必須適應特定的Excel字體類型。

一,助手類:

static class TextHelper
{
    // You may have to use a different type than `FontStyle` 
    // Hopefully ch.Font has some type of `Style` property you can use
    public static FontStyle CurrentStyle { get; set; }
    public static string OpenTag { get { return GetOpenTag(); } }
    public static string CloseTag { get { return GetCloseTag(); } }

    // This will return the closing tag for the current font style, 
    // followed by the opening tag for the new font style
    public static string ChangeStyleIfNeeded(FontStyle newStyle)
    {
        if (newStyle == CurrentStyle) return string.Empty;

        var transitionStyleTags = GetCloseTag();
        CurrentStyle = newStyle;
        transitionStyleTags += GetOpenTag();

        return transitionStyleTags;
    }

    private static string GetOpenTag()
    {
        switch (CurrentStyle)
        {
            case FontStyle.Bold:
                return "<b>";
            case FontStyle.Italic:
                return "<i>";
            case FontStyle.Underline:
                return "<u>";
            default:
                return "";
        }
    }

    private static string GetCloseTag()
    {
        switch (CurrentStyle)
        {
            case FontStyle.Bold:
                return "</b>";
            case FontStyle.Italic:
                return "</i>";
            case FontStyle.Underline:
                return "</u>";
            default:
                return "";
        }
    }
}

接下來,實現看起來像這樣:

// Start our helper class with 'Regular' font
TextHelper.CurrentStyle = FontStyle.Regular;
var html = new StringBuilder();

for (int index = 1; index <= cell.Text.ToString().Length; index++)
{
    char ch = cell.get_Characters(index, 1);

    // If the Font of this character is different than the current font, 
    // this will close the old style and open our new style.
    html.Append(TextHelper.ChangeStyleIfNeeded(ch.Font));

    // Append this character
    html.Append(ch.Text);
}

// Close the style at the very end
html.Append(TextHelper.CloseTag);

我花了一半的時間來弄清楚這個解決方案。

1.代碼適用於粗體斜體和下划線字符。

這個算法有點復雜。 如果有任何優化或任何人提出更好的解決方案,請發布新的答案。

ExcelReader方法:

public string ExcelReader(string excelFilePath)
    {
        StringBuilder resultText = new StringBuilder();
        //string excelFilePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Test.xls");
        Application excel = new Application();
        Workbook wb = excel.Workbooks.Open(excelFilePath);
        Worksheet excelSheet = wb.ActiveSheet;
        //Read the first cell
        Range cell = excelSheet.Cells[1, 1];

        //Check if one bold or italic WORD.
        bool IfStop = false;
        //Check if character is the start of bold or italic character.
        bool ifFirstSpecialCharacter = true;
        //Initialize a empty tag
        string tag = "";
        //Check if it is the last index
        bool isLastIndex = false;
        for (int index = 1; index <= cell.Text.ToString().Length; index++)
        {
            //Check if the current character is bold or italic
            bool IfSpecialType = false;
            //cell here is a Range object
            Characters ch = cell.get_Characters(index, 1);
            XlUnderlineStyle temp = (XlUnderlineStyle)ch.Font.Underline;
            bool underline = false;

            if (temp == XlUnderlineStyle.xlUnderlineStyleSingle)
                underline = true;

            bool bold = (bool)ch.Font.Bold;
            bool italic = (bool)ch.Font.Italic;

            if (underline)
            {
                if (tag != "" && tag != "<u>")
                {
                    resultText.Append(tag.Insert(1, "/"));
                    ifFirstSpecialCharacter = true;
                    IfStop = true;
                }
                tag = "<u>";
                IfSpecialType = true;
            }
            if (bold)
            {
                if (tag != "" && tag != "<b>")
                {
                    resultText.Append(tag.Insert(1, "/"));
                    ifFirstSpecialCharacter = true;
                    IfStop = true;
                }
                tag = "<b>";
                IfSpecialType = true;
            }
            if (italic)
            {
                if (tag != "" && tag != "<i>")
                {
                    resultText.Append(tag.Insert(1, "/"));
                    ifFirstSpecialCharacter = true;
                    IfStop = true;
                }
                tag = "<i>";
                IfSpecialType = true;
            }
            if (index == cell.Text.ToString().Length)
                isLastIndex = true;
            DetectSpecialCharracterByType(isLastIndex, resultText, ref tag, IfSpecialType, ref IfStop, ref ifFirstSpecialCharacter, ch);
        }
        wb.Close();
        return resultText.ToString();
    }

DetectSpecialCharacterByType方法:

private static void DetectSpecialCharacterByType(bool isLastIndex, StringBuilder fontText, ref string tag, bool ifSpecialType, ref bool IfStop, ref bool ifFirstSpecialCharacter, Characters ch)
    {
        if (ifSpecialType)
        {
            //If it is the first character of the word, put the <b> or <i> at the beginning.
            if (ifFirstSpecialCharacter)
            {
                fontText.Append(tag);
                ifFirstSpecialCharacter = false;
                IfStop = false;
            }
            //This is a edge case.If the last word of the text is bold or italic, put the </b> or </i>
            if (isLastIndex)
            {
                fontText.Append(ch.Text);
                fontText.Append(tag.Insert(1, "/"));
            }
            else
                fontText.Append(ch.Text);
        }
        else
        {
            //If it is the last character of one word, add </b> or </i> at the end.
            if (!IfStop && tag != "")
            {
                fontText.Append(tag.Insert(1, "/"));
                IfStop = true;
                ifFirstSpecialCharacter = true;
                tag = "";
            }
            fontText.Append(ch.Text);
        }
    }

代碼完美地通過復制粘貼和添加新的引用Microsoft.Office.Interop.Excel

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM