简体   繁体   English

带有标记的 HTML 文本,用于 Excel 单元格中的格式化文本

[英]HTML Text with tags to formatted text in an Excel cell

Is there a way to take HTML and import it to excel so that it is formatted as rich text (preferably by using VBA)?有没有办法将 HTML 导入到 Excel 中,以便将其格式化为富文本(最好使用 VBA)? Basically, when I paste to an Excel cell, I'm looking to turn this:基本上,当我粘贴到 Excel 单元格时,我希望将其转换为:

<html><p>This is a test. Will this text be <b>bold</b> or <i>italic</i></p></html>

into this:进入这个:

This is a test.这是一个测试。 Will this text be bold or italic这段文字是粗体还是斜体

Yes it is possible :) In fact let Internet Explorer do the dirty work for you ;)是的,这是可能的 :) 事实上,让 Internet Explorer 为你做那些肮脏的工作 ;)

TRIED AND TESTED久经考验

MY ASSUMPTIONS我的假设

  1. I am assuming that the html text is in Cell A1 of Sheet1.我假设 html 文本位于 Sheet1 的单元格 A1 中。 You can also use a variable instead.您也可以改用变量。
  2. If you have a column full of html values, then simply put the below code in a loop如果您有一列充满 html 值,那么只需将以下代码放入循环中

For current browsers:对于当前浏览器:

Sub Sample()
    Dim Ie As Object
    
    Set Ie = CreateObject("InternetExplorer.Application")
    
    With Ie
        .Visible = False
        
        .Navigate "about:blank"
        
        .document.body.InnerHTML = Sheets("Sheet1").Range("A1").Value
        
        .ExecWB 17, 0: .ExecWB 12, 2
        ActiveSheet.Paste Destination:=Sheets("Sheet1").Range("A1")
        
        .Quit
    End With
End Sub

For older browsers:对于旧浏览器:

Sub Sample()
    Dim Ie As Object
    
    Set Ie = CreateObject("InternetExplorer.Application")
    
    With Ie
        .Visible = False
        
        .Navigate "about:blank"
        
        .document.body.InnerHTML = Sheets("Sheet1").Range("A1").Value
        
        .document.body.createtextrange.execCommand "Copy"
        ActiveSheet.Paste Destination:=Sheets("Sheet1").Range("A1")
        
        .Quit
    End With
End Sub

SNAPSHOT快照

在此处输入图片说明

You can copy the HTML code to the clipboard and paste special it back as Unicode text.您可以将 HTML 代码复制到剪贴板,然后将其作为 Unicode 文本特殊粘贴回去。 Excel will render the HTML in the cell. Excel 将在单元格中呈现 HTML。 Check out this post http://www.dailydoseofexcel.com/archives/2005/02/23/html-in-cells-ii/查看这篇文章http://www.dailydoseofexcel.com/archives/2005/02/23/html-in-cells-ii/

The relevant macro code from the post:帖子中的相关宏代码:

Private Sub Worksheet_Change(ByVal Target As Range)

   Dim objData As DataObject
   Dim sHTML As String
   Dim sSelAdd As String

   Application.EnableEvents = False

   If Target.Cells.Count = 1 Then
      If LCase(Left(Target.Text, 6)) = "<html>" Then
         Set objData = New DataObject

         sHTML = Target.Text

         objData.SetText sHTML
         objData.PutInClipboard

         sSelAdd = Selection.Address
         Target.Select
         Me.PasteSpecial "Unicode Text"
         Me.Range(sSelAdd).Select

      End If
   End If

   Application.EnableEvents = True

End Sub

I ran into the same error that BornToCode first identified in the comments of the original solution.我遇到了与 BornToCode 最初在原始解决方案的注释中发现的相同的错误。 Being unfamiliar with Excel and VBA it took me a second to figure out how to implement tiQU's solution.由于不熟悉 Excel 和 VBA,我花了一秒钟才弄清楚如何实施 tiQU 的解决方案。 So I'm posting it as a "For Dummies" solution below所以我将它作为“傻瓜”解决方案发布在下面

  1. First enable developer mode in Excel: Link首先在 Excel 中启用开发人员模式: 链接
  2. Select the Developer Tab > Visual Basic选择开发人员选项卡 > Visual Basic
  3. Click View > Code单击查看 > 代码
  4. Paste the code below updating the lines that require cell references to be correct.粘贴下面的代码更新需要单元格引用正确的行。
  5. Click the Green Run Arrow or press F5单击绿色运行箭头或按 F5
Sub Sample()
    Dim Ie As Object
    Set Ie = CreateObject("InternetExplorer.Application")
    With Ie
        .Visible = False
        .Navigate "about:blank"
        .document.body.InnerHTML = Sheets("Sheet1").Range("I2").Value
             'update to the cell that contains HTML you want converted
        .ExecWB 17, 0
             'Select all contents in browser
        .ExecWB 12, 2
             'Copy them
        ActiveSheet.Paste Destination:=Sheets("Sheet1").Range("J2")
             'update to cell you want converted HTML pasted in
        .Quit
    End With
End Sub

If the IE example doesn't work use this one.如果 IE 示例不起作用,请使用此示例。 Anyway this should be faster than starting up an instance of IE.无论如何,这应该比启动 IE 实例更快。

Here is a complete solution based on这是一个基于的完整解决方案
http://www.dailydoseofexcel.com/archives/2005/02/23/html-in-cells-ii/ http://www.dailydoseofexcel.com/archives/2005/02/23/html-in-cells-ii/

Note, if your innerHTML is all numbers eg '12345', HTML formatting dosen't fully work in excel as it treats number differently?请注意,如果您的 innerHTML 都是数字,例如“12345”,HTML 格式在 excel 中不能完全工作,因为它对数字的处理方式不同? but add a character eg a trailing space at the end eg.但在末尾添加一个字符,例如尾随空格,例如。 12345 + "& nbsp;" 12345+" formats ok.格式正常。

Sub test()
    Cells(1, 1).Value = "<HTML>1<font color=blue>a</font>" & _
                        "23<font color=red>4</font></HTML>"
    Dim rng As Range
    Set rng = ActiveSheet.Cells(1, 1)
    Worksheet_Change rng, ActiveSheet
End Sub


Private Sub Worksheet_Change(ByVal Target As Range, ByVal sht As Worksheet)

    Dim objData As DataObject ' Set a reference to MS Forms 2.0
    Dim sHTML As String
    Dim sSelAdd As String

    Application.EnableEvents = False

    If Target.Cells.Count = 1 Then

            Set objData = New DataObject
            sHTML = Target.Text
            objData.SetText sHTML
            objData.PutInClipboard
            Target.Select
            sht.PasteSpecial Format:="Unicode Text"
    End If

    Application.EnableEvents = True

End Sub

I know this thread is ancient, but after assigning the innerHTML, ExecWB worked for me:我知道这个线程很古老,但是在分配了 innerHTML 之后,ExecWB 为我工作:

 .ExecWB 17, 0 'Select all contents in browser .ExecWB 12, 2 'Copy them

And then just paste the contents into Excel.然后只需将内容粘贴到 Excel 中即可。 Since these methods are prone to runtime errors, but work fine after one or two tries in debug mode, you might have to tell Excel to try again if it runs into an error.由于这些方法容易出现运行时错误,但在调试模式下尝试一两次后工作正常,如果遇到错误,您可能必须告诉 Excel 再试一次。 I solved this by adding this error handler to the sub, and it works fine:我通过将这个错误处理程序添加到 sub 来解决这个问题,它工作正常:

 Sub ApplyHTML() On Error GoTo ErrorHandler ... Exit Sub ErrorHandler: Resume 'Ie re-run the line of code that caused the error Exit Sub End Sub

Nice!好的! Very slick.很滑。

I was disappointed that Excel doesn't let us paste to a merged cell and also pastes results containing a break into successive rows below the "target" cell though, as that meant it simply doesn't work for me.我很失望 Excel 不允许我们粘贴到合并的单元格,并且还将包含中断的结果粘贴到“目标”单元格下方的连续行中,因为这意味着它根本不适用于我。 I tried a few tweaks (unmerge/remerge, etc.) but then Excel dropped anything below a break, so that was a dead end.我尝试了一些调整(取消合并/重新合并等),但随后 Excel 将任何内容都放在了休息时间以下,所以这是一个死胡同。

Ultimately, I came up with a routine that'll handle simple tags and not use the "native" Unicode converter that is causing the issue with merged fields.最终,我想出了一个处理简单标签的例程,而不使用导致合并字段问题的“本机”Unicode 转换器。 Hope others find this useful:希望其他人觉得这很有用:

Public Sub AddHTMLFormattedText(rngA As Range, strHTML As String, Optional blnShowBadHTMLWarning As Boolean = False)
    ' Adds converts text formatted with basic HTML tags to formatted text in an Excel cell
    ' NOTE: Font Sizes not handled perfectly per HTML standard, but I find this method more useful!

    Dim strActualText As String, intSrcPos As Integer, intDestPos As Integer, intDestSrcEquiv() As Integer
    Dim varyTags As Variant, varTag As Variant, varEndTag As Variant, blnTagMatch As Boolean
    Dim intCtr As Integer
    Dim intStartPos As Integer, intEndPos As Integer, intActualStartPos As Integer, intActualEndPos As Integer
    Dim intFontSizeStartPos As Integer, intFontSizeEndPos As Integer, intFontSize As Integer

    varyTags = Array("<b>", "</b>", "<i>", "</i>", "<u>", "</u>", "<sub>", "</sub>", "<sup>", "</sup>")

    ' Remove unhandled/unneeded tags, convert <br> and <p> tags to line feeds
    strHTML = Trim(strHTML)
    strHTML = Replace(strHTML, "<html>", "")
    strHTML = Replace(strHTML, "</html>", "")
    strHTML = Replace(strHTML, "<p>", "")
    While LCase(Right$(strHTML, 4)) = "</p>" Or LCase(Right$(strHTML, 4)) = "<br>"
        strHTML = Left$(strHTML, Len(strHTML) - 4)
        strHTML = Trim(strHTML)
    Wend
    strHTML = Replace(strHTML, "<br>", vbLf)
    strHTML = Replace(strHTML, "</p>", vbLf)

    strHTML = Trim(strHTML)

    ReDim intDestSrcEquiv(1 To Len(strHTML))
    strActualText = ""
    intSrcPos = 1
    intDestPos = 1
    Do While intSrcPos <= Len(strHTML)
        blnTagMatch = False
        For Each varTag In varyTags
            If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
                blnTagMatch = True
                intSrcPos = intSrcPos + Len(varTag)
                If intSrcPos > Len(strHTML) Then Exit Do
                Exit For
            End If
        Next
        If blnTagMatch = False Then
            varTag = "<font size"
            If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
                blnTagMatch = True
                intEndPos = InStr(intSrcPos, strHTML, ">")
                intSrcPos = intEndPos + 1
                If intSrcPos > Len(strHTML) Then Exit Do
            Else
                varTag = "</font>"
                If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
                    blnTagMatch = True
                    intSrcPos = intSrcPos + Len(varTag)
                    If intSrcPos > Len(strHTML) Then Exit Do
                End If
            End If
        End If
        If blnTagMatch = False Then
            strActualText = strActualText & Mid$(strHTML, intSrcPos, 1)
            intDestSrcEquiv(intSrcPos) = intDestPos
            intDestPos = intDestPos + 1
            intSrcPos = intSrcPos + 1
        End If
    Loop

    ' Clear any bold/underline/italic/superscript/subscript formatting from cell
    rngA.Font.Bold = False
    rngA.Font.Underline = False
    rngA.Font.Italic = False
    rngA.Font.Subscript = False
    rngA.Font.Superscript = False

    rngA.Value = strActualText

    ' Now start applying Formats!"
    ' Start with Font Size first
    intSrcPos = 1
    intDestPos = 1
    Do While intSrcPos <= Len(strHTML)
        varTag = "<font size"
        If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
            intFontSizeStartPos = InStr(intSrcPos, strHTML, """") + 1
            intFontSizeEndPos = InStr(intFontSizeStartPos, strHTML, """") - 1
            If intFontSizeEndPos - intFontSizeStartPos <= 3 And intFontSizeEndPos - intFontSizeStartPos > 0 Then
                Debug.Print Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
                If Mid$(strHTML, intFontSizeStartPos, 1) = "+" Then
                    intFontSizeStartPos = intFontSizeStartPos + 1
                    intFontSize = 11 + 2 * Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
                ElseIf Mid$(strHTML, intFontSizeStartPos, 1) = "-" Then
                    intFontSizeStartPos = intFontSizeStartPos + 1
                    intFontSize = 11 - 2 * Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
                Else
                    intFontSize = Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
                End If
            Else
                ' Error!
                GoTo HTML_Err
            End If
            intEndPos = InStr(intSrcPos, strHTML, ">")
            intSrcPos = intEndPos + 1
            intStartPos = intSrcPos
            If intSrcPos > Len(strHTML) Then Exit Do
            While intDestSrcEquiv(intStartPos) = 0 And intStartPos < Len(strHTML)
                intStartPos = intStartPos + 1
            Wend
            If intStartPos >= Len(strHTML) Then GoTo HTML_Err ' HTML is bad!
            varEndTag = "</font>"
            intEndPos = InStr(intSrcPos, LCase(strHTML), varEndTag)
            If intEndPos = 0 Then GoTo HTML_Err ' HTML is bad!
            While intDestSrcEquiv(intEndPos) = 0 And intEndPos > intSrcPos
                intEndPos = intEndPos - 1
            Wend
            If intEndPos > intSrcPos Then
                intActualStartPos = intDestSrcEquiv(intStartPos)
                intActualEndPos = intDestSrcEquiv(intEndPos)
                rngA.Characters(intActualStartPos, intActualEndPos - intActualStartPos + 1) _
                    .Font.Size = intFontSize
            End If
        End If
        intSrcPos = intSrcPos + 1
    Loop

    'Now do remaining tags
    intSrcPos = 1
    intDestPos = 1
    Do While intSrcPos <= Len(strHTML)
        If intDestSrcEquiv(intSrcPos) = 0 Then
            ' This must be a Tag!
            For intCtr = 0 To UBound(varyTags) Step 2
                varTag = varyTags(intCtr)
                intStartPos = intSrcPos + Len(varTag)
                While intDestSrcEquiv(intStartPos) = 0 And intStartPos < Len(strHTML)
                    intStartPos = intStartPos + 1
                Wend
                If intStartPos >= Len(strHTML) Then GoTo HTML_Err ' HTML is bad!
                If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
                    varEndTag = varyTags(intCtr + 1)
                    intEndPos = InStr(intSrcPos, LCase(strHTML), varEndTag)
                    If intEndPos = 0 Then GoTo HTML_Err ' HTML is bad!
                    While intDestSrcEquiv(intEndPos) = 0 And intEndPos > intSrcPos
                        intEndPos = intEndPos - 1
                    Wend
                    If intEndPos > intSrcPos Then
                        intActualStartPos = intDestSrcEquiv(intStartPos)
                        intActualEndPos = intDestSrcEquiv(intEndPos)
                        With rngA.Characters(intActualStartPos, intActualEndPos - intActualStartPos + 1).Font
                            If varTag = "<b>" Then
                                .Bold = True
                            ElseIf varTag = "<i>" Then
                                .Italic = True
                            ElseIf varTag = "<u>" Then
                                .Underline = True
                            ElseIf varTag = "<sup>" Then
                                .Superscript = True
                            ElseIf varTag = "<sub>" Then
                                .Subscript = True
                            End If
                        End With
                    End If
                    intSrcPos = intSrcPos + Len(varTag) - 1
                    Exit For
                End If
            Next
        End If
        intSrcPos = intSrcPos + 1
        intDestPos = intDestPos + 1
    Loop
Exit_Sub:
    Exit Sub
HTML_Err:
    ' There was an error with the Tags. Show warning if requested.
    If blnShowBadHTMLWarning Then
        MsgBox "There was an error with the Tags in the HTML file. Could not apply formatting."
    End If
End Sub

Note this doesn't care about tag nesting, instead only requiring a close tag for every open tag, and assuming the close tag nearest the opening tag applies to the opening tag.请注意,这并不关心标签嵌套,而是只需要每个打开标签的结束标签,并假设最接近开始标签的结束标签适用于开始标签。 Properly nested tags will work fine, while improperly nested tags will not be rejected and may or may not work.正确嵌套的标签可以正常工作,而不正确嵌套的标签不会被拒绝,并且可能会也可能不会起作用。

To put HTML/Word in an Excel Shape and locate it on an Excel Cell:将 HTML/Word 放入 Excel 形状并将其定位在 Excel 单元格中:

  1. Write my HTML to a temp file.将我的 HTML 写入临时文件。
  2. Open temp file via Word Interop.通过 Word Interop 打开临时文件。
  3. Copy it from Word to clipboard.将其从 Word 复制到剪贴板。
  4. Open Excel via Interop.通过 Interop 打开 Excel。
  5. Set and Select a cell to a range.设置并选择一个单元格到一个范围。
  6. PasteSpecial as a "Microsoft Word Document Object" PasteSpecial 作为“Microsoft Word 文档对象”
  7. Adjust the excel row to the Shape height.将 excel 行调整为形状高度。

In this way, even HTML with tables and other stuff does not get split over multiple cells.这样,即使是带有表格和其他内容的 HTML 也不会拆分到多个单元格中。

    private void btnPutHTMLIntoExcelShape_Click(object sender, EventArgs e)
    {
        var fFile = new FileInfo(@"C:\Temp\temp.html");
        StreamWriter SW = fFile.CreateText();
        SW.Write(hecNote.DocumentHtml);
        SW.Close();

        Word.Application wrdApplication;
        Word.Document wrdDocument;
        wrdApplication = new Word.Application();
        wrdApplication.Visible = true;

        wrdDocument = wrdApplication.Documents.Add(@"C:\Temp\temp.html");
        wrdDocument.ActiveWindow.Selection.WholeStory();
        wrdDocument.ActiveWindow.Selection.Copy();

        Excel.Application excApplication;
        Excel.Workbook excWorkbook;
        Excel._Worksheet excWorksheet;
        Excel.Range excRange = null;

        excApplication = new Excel.Application();
        excApplication.Visible = true;
        excWorkbook = excApplication.Workbooks.Add(Type.Missing);
        excWorksheet = (Excel.Worksheet)excWorkbook.Worksheets.get_Item(1);
        excWorksheet.Name = "Work";
        excRange = excWorksheet.get_Range("A1");
        excRange.Select();

        excWorksheet.PasteSpecial("Microsoft Word Document Object");

        Excel.Shape O = excWorksheet.Shapes.Item(1);

        this.Text = $"{O.Height} x {O.Width}";
        ((Excel.Range)excWorksheet.Rows[1, Type.Missing]).RowHeight = O.Height;
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM