简体   繁体   English

如何从纯文本中检索粗体,斜体和带下划线的单词,并用HTML标记将它们包围起来

[英]How to retrieve bold, italic and underlined words from plain text and surround them by HTML tags

What I want to achieve: 我想要实现的目标:

Input: (Input text comes from a Excel cell) 输入:(输入文本来自Excel单元格)

This is a string includes bold , italic and underlined words. 这是一个字符串,包括粗体斜体和带下划线的单词。

Expected output: 预期产量:

This is a <b>string</b> includes <b>bold</b>, <i>italic</i> and <u>underlined</u> words.

What I tried: (This method iterates the plain text by characters not words.) 我尝试了什么:(此方法通过字符而不是单词来迭代纯文本。)

        StringBuilder html = new StringBuilder();
        StringBuilder fontText = new StringBuilder();
        string path = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Test.xls");
        Application excel = new Application();
        Workbook wb = excel.Workbooks.Open(path);
        Worksheet excelSheet = wb.ActiveSheet;
        //Read the first cell
        Range cell = excelSheet.Cells[1, 1];
          for (int index = 1; index <= cell.Text.ToString().Length; index++)
            {
              //cell here is a Range object
              Characters ch = cell.get_Characters(index, 1);
              bool bold = (bool) ch.Font.Bold;
              if(bold){
                 if (html.Length == 0)
                      html.Append("<b>");
                 html.Append(ch.Text);
           }
      }
      if (html.Length !=0) html.Append("</b>")

But this method returns all bold texts surrounded by HTML tags like <b>stringbold</b> 但是此方法返回由HTML标记包围的所有粗体文本,如<b>stringbold</b>

Expected result is: <b>string</b> and <b>bold</b> 预期结果为: <b>string</b><b>bold</b>

Any great thoughts on this? 对此有何好的想法?

Thanks in advance. 提前致谢。

Here's what I would do: 这就是我要做的事情:

  1. Create a helper class that knows about Font styles, and their opening and closing tags, and which can keep track of the "current" font style 创建一个帮助类,它知道字体样式及其开始和结束标记,并且可以跟踪“当前”字体样式

  2. Start out the class with Regular style, and then in you loop, ask the helper class to insert opening and closing tags if the font style has changed before writing the current character 使用Regular样式开始该类,然后在循环中,如果在写入当前字符之前字体样式已更改,请求帮助程序类插入开始和结束标记

  3. At the end of the loop, ask the helper to insert the proper closing tag 在循环结束时,请求助手插入正确的结束标记

I don't have an Excel interop project to play with, so here's a sample, which you may have to adapt to the specific Excel font types. 我没有可以使用的Excel互操作项目,所以这是一个示例,您可能必须适应特定的Excel字体类型。

First, the helper class: 一,助手类:

static class TextHelper
{
    // You may have to use a different type than `FontStyle` 
    // Hopefully ch.Font has some type of `Style` property you can use
    public static FontStyle CurrentStyle { get; set; }
    public static string OpenTag { get { return GetOpenTag(); } }
    public static string CloseTag { get { return GetCloseTag(); } }

    // This will return the closing tag for the current font style, 
    // followed by the opening tag for the new font style
    public static string ChangeStyleIfNeeded(FontStyle newStyle)
    {
        if (newStyle == CurrentStyle) return string.Empty;

        var transitionStyleTags = GetCloseTag();
        CurrentStyle = newStyle;
        transitionStyleTags += GetOpenTag();

        return transitionStyleTags;
    }

    private static string GetOpenTag()
    {
        switch (CurrentStyle)
        {
            case FontStyle.Bold:
                return "<b>";
            case FontStyle.Italic:
                return "<i>";
            case FontStyle.Underline:
                return "<u>";
            default:
                return "";
        }
    }

    private static string GetCloseTag()
    {
        switch (CurrentStyle)
        {
            case FontStyle.Bold:
                return "</b>";
            case FontStyle.Italic:
                return "</i>";
            case FontStyle.Underline:
                return "</u>";
            default:
                return "";
        }
    }
}

Next, the implementation would look something like this: 接下来,实现看起来像这样:

// Start our helper class with 'Regular' font
TextHelper.CurrentStyle = FontStyle.Regular;
var html = new StringBuilder();

for (int index = 1; index <= cell.Text.ToString().Length; index++)
{
    char ch = cell.get_Characters(index, 1);

    // If the Font of this character is different than the current font, 
    // this will close the old style and open our new style.
    html.Append(TextHelper.ChangeStyleIfNeeded(ch.Font));

    // Append this character
    html.Append(ch.Text);
}

// Close the style at the very end
html.Append(TextHelper.CloseTag);

It took half of my day to figure out this solution. 我花了一半的时间来弄清楚这个解决方案。

1.The code works with Bold , Italic and underline characters. 1.代码适用于粗体斜体和下划线字符。

2.The algorithm is little bit complicated. 这个算法有点复杂。 If any optimization available or anyone come up with better solution, please post new answer. 如果有任何优化或任何人提出更好的解决方案,请发布新的答案。

ExcelReader method: ExcelReader方法:

public string ExcelReader(string excelFilePath)
    {
        StringBuilder resultText = new StringBuilder();
        //string excelFilePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Test.xls");
        Application excel = new Application();
        Workbook wb = excel.Workbooks.Open(excelFilePath);
        Worksheet excelSheet = wb.ActiveSheet;
        //Read the first cell
        Range cell = excelSheet.Cells[1, 1];

        //Check if one bold or italic WORD.
        bool IfStop = false;
        //Check if character is the start of bold or italic character.
        bool ifFirstSpecialCharacter = true;
        //Initialize a empty tag
        string tag = "";
        //Check if it is the last index
        bool isLastIndex = false;
        for (int index = 1; index <= cell.Text.ToString().Length; index++)
        {
            //Check if the current character is bold or italic
            bool IfSpecialType = false;
            //cell here is a Range object
            Characters ch = cell.get_Characters(index, 1);
            XlUnderlineStyle temp = (XlUnderlineStyle)ch.Font.Underline;
            bool underline = false;

            if (temp == XlUnderlineStyle.xlUnderlineStyleSingle)
                underline = true;

            bool bold = (bool)ch.Font.Bold;
            bool italic = (bool)ch.Font.Italic;

            if (underline)
            {
                if (tag != "" && tag != "<u>")
                {
                    resultText.Append(tag.Insert(1, "/"));
                    ifFirstSpecialCharacter = true;
                    IfStop = true;
                }
                tag = "<u>";
                IfSpecialType = true;
            }
            if (bold)
            {
                if (tag != "" && tag != "<b>")
                {
                    resultText.Append(tag.Insert(1, "/"));
                    ifFirstSpecialCharacter = true;
                    IfStop = true;
                }
                tag = "<b>";
                IfSpecialType = true;
            }
            if (italic)
            {
                if (tag != "" && tag != "<i>")
                {
                    resultText.Append(tag.Insert(1, "/"));
                    ifFirstSpecialCharacter = true;
                    IfStop = true;
                }
                tag = "<i>";
                IfSpecialType = true;
            }
            if (index == cell.Text.ToString().Length)
                isLastIndex = true;
            DetectSpecialCharracterByType(isLastIndex, resultText, ref tag, IfSpecialType, ref IfStop, ref ifFirstSpecialCharacter, ch);
        }
        wb.Close();
        return resultText.ToString();
    }

DetectSpecialCharacterByType method: DetectSpecialCharacterByType方法:

private static void DetectSpecialCharacterByType(bool isLastIndex, StringBuilder fontText, ref string tag, bool ifSpecialType, ref bool IfStop, ref bool ifFirstSpecialCharacter, Characters ch)
    {
        if (ifSpecialType)
        {
            //If it is the first character of the word, put the <b> or <i> at the beginning.
            if (ifFirstSpecialCharacter)
            {
                fontText.Append(tag);
                ifFirstSpecialCharacter = false;
                IfStop = false;
            }
            //This is a edge case.If the last word of the text is bold or italic, put the </b> or </i>
            if (isLastIndex)
            {
                fontText.Append(ch.Text);
                fontText.Append(tag.Insert(1, "/"));
            }
            else
                fontText.Append(ch.Text);
        }
        else
        {
            //If it is the last character of one word, add </b> or </i> at the end.
            if (!IfStop && tag != "")
            {
                fontText.Append(tag.Insert(1, "/"));
                IfStop = true;
                ifFirstSpecialCharacter = true;
                tag = "";
            }
            fontText.Append(ch.Text);
        }
    }

Code perfectly works by simply copy pasting and adding new reference Microsoft.Office.Interop.Excel 代码完美地通过复制粘贴和添加新的引用Microsoft.Office.Interop.Excel

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在其子串中解析带有html标记的字符串,粗体,斜体,下划线 - how to parse a string with html tags in its substrings which are bold, italic, underlined 如何以粗体,斜体,下划线和其他html标签阅读文本 - how to read text in bold,italic,underline and other html tags 如何使文本框中的文本既粗体又斜体? - How to make a text in the textbox both bold and italic? 使用C#从Word文档中读取粗体和斜体字 - Read words that are bold and Italic from word document using c# AjaxControlToolit HTMLEditorExtender不会在最终提交的文本中添加粗体,斜体,下标和上标标签 - AjaxControlToolit HTMLEditorExtender does not add bold, italic, subscript and superscript tags to the final submitted text 需要使用javascript将选定的文本设置为粗体/斜体/下划线,并使用c#保存和检​​索相同的文本 - Need to make selected text as bold/italic/underline using javascript, and also save & retrieve the same using c# 如何用引号将单词引起来 - How to surround words with quotes 我如何从RichTextBox读取内容信息是粗体,下划线,斜体等 - How I can read contents from RichTextBox with information is it bolded, underlined, italic etc 如何用标签包围我的文本行? - How do I surround my line of text with tags? 资源文件中的HTML标记显示为纯文本 - HTML Tags in Resource File Showing as Plain Text
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM