简体   繁体   中英

How to convert an Excel formatted text to HTML?

I made a web platform where users can add a long description for a product. The Description field is a wysiwyg editor and the content is saved in HTML in my database (MySQL).

The users rewrote all the descriptions in an Excel file because they have a lot of products (1700+) and they want to do a bulk import of these texts in the database : 1 cell = 1 description = 1 product.

They formatted the texts (bold, italic, underline, paragraphs...) and I must keep that layout when I will import these descriptions in the database. So, I have to convert these texts into HTML language .

Any idea ?

The only way I found is to :

  1. Copy-paste the cells in a Word file (to keep the layout).
  2. Copy-paste the Word content to an online HTML converter.
  3. Copy-paste the converted text to another Excel file for my bulk import.

It would be really time-consuming for the 1700+ products (besides, we have multiple languages...).


EDIT :

[Imagine this text in a cell]

Vestibulum eget viverra nisi.

Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.

Suspendisse varius nisi quis metus semper dapibus. Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis

[I want this in a cell]

<p><strong>Vestibulum eget viverra nisi.</strong></p>

<p>Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.</p>

<p><em>Suspendisse varius nisi quis metus semper dapibus.</em> Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis</p>

Or maybe this (replace the line breaks with <br /> tags) :

<strong>Vestibulum eget viverra nisi.</strong><br />
<br />
Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.<br />
<br />
<em>Suspendisse varius nisi quis metus semper dapibus.</em> Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis

https://excelribbon.tips.net/T013402_Finding_Positions_of_Formatted_Characters_in_a_Cell.html shows how one can find the fontstyle of each character in the string using VBA:

Sub Show_Character_Formats()

For i = 1 To Len(Range("A1").Value)
    Range("C1").Offset(i, 0).Value = Range("A1").Characters(i, 1).Text
    Range("C1").Offset(i, 1).Value = Asc(Range("A1").Characters(i, 1).Text)
    Range("C1").Offset(i, 2).Value = Range("A1").Characters(i, 1).Font.FontStyle
    Range("C1").Offset(i, 3).Value = Range("A1").Characters(i, 1).Font.Underline
Next i

End Sub

With that capability, is the rest of a solution obvious to you?


This code transforms A1 (with character-specific bolding and italicising) into HTML in A3:
Eg This cell then has bold and then italic text creates:
This cell then has <strong>bold</strong> and then <em>italic </em>text

Sub Format_to_HTML()

Dim Source As String
Dim Target As String
Dim Length As Integer

Source = Range("A1").Value
Length = Len(Source)

For i = 1 To (Length - 1)
    Target = Target & Mid(Source, i, 1)
    If Range("A1").Characters(i, 1).Font.FontStyle = "Regular" And Range("A1").Characters(i + 1, 1).Font.FontStyle = "Bold" Then
        Target = Target & "<strong>"
    End If
    If Range("A1").Characters(i, 1).Font.FontStyle = "Bold" And Range("A1").Characters(i + 1, 1).Font.FontStyle = "Regular" Then
        Target = Target & "</strong>"
    End If
    If Range("A1").Characters(i, 1).Font.FontStyle = "Regular" And Range("A1").Characters(i + 1, 1).Font.FontStyle = "Italic" Then
        Target = Target & "<em>"
    End If
    If Range("A1").Characters(i, 1).Font.FontStyle = "Italic" And Range("A1").Characters(i + 1, 1).Font.FontStyle = "Regular" Then
        Target = Target & "</em>"
    End If

Next i
Target = Target & Right(Source, 1)

Range("A3").Value = Target

End Sub

Depending on your particular data you may need to add transitions directly between bold and italic, and to bold AND italic. If non-regular format ever applies at the very beginning or end of the string then you will also need to add some code before or after the loop to insert the relevant tags there.
Then whatever cycling is appropriate to address multiple cells.

With ac# routine, I have been able to determine that when a formatted cell is "copied" in Excel, Excel places an HTML version of it in the Windows clipboard along with some other types like plain text and bitmap graphic.

This is the c# routine to retrieve the HTML from the Windows clipboard (with some help from "Current thread must be set to single thread apartment (STA)" error in copy string to clipboard )

using System.Threading;

namespace SO72854821HTMLfromClipboard
{
    public class HTMLfromClipboard
    {
        public static string getHTMLFromClipboard()
        {
            string htmlString = null;
            Thread STAThread = new Thread(
                delegate ()
                {
                    if (System.Windows.Clipboard.ContainsText(System.Windows.TextDataFormat.Html))
                    {
                        htmlString = System.Windows.Clipboard.GetText(System.Windows.TextDataFormat.Html);
                    }
                });
            STAThread.SetApartmentState(ApartmentState.STA);
            STAThread.Start();
            STAThread.Join();

            return htmlString;
        }

    }
}

When I enter your formatted text in an Excel cell, copy it with Ctrl-C, and then retrieve the value with this method, the result is an HTML representation of your text:

Version:1.0
StartHTML:0000000148
EndHTML:0000002905
StartFragment:0000002319
EndFragment:0000002845
SourceURL:file:///C:/Temp/SO72854821.xlsm

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/hamkchr/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/hamkchr/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<style>
<!--table
    {mso-displayed-decimal-separator:"\,";
    mso-displayed-thousand-separator:"\.";}
@page
    {margin:.79in .7in .79in .7in;
    mso-header-margin:.3in;
    mso-footer-margin:.3in;}
.font0
    {color:black;
    font-size:10.0pt;
    font-weight:400;
    font-style:normal;
    text-decoration:none;
    font-family:Arial, sans-serif;
    mso-font-charset:0;}
.font5
    {color:black;
    font-size:10.0pt;
    font-weight:700;
    font-style:normal;
    text-decoration:none;
    font-family:Arial, sans-serif;
    mso-font-charset:0;}
.font6
    {color:black;
    font-size:10.0pt;
    font-weight:400;
    font-style:italic;
    text-decoration:none;
    font-family:Arial, sans-serif;
    mso-font-charset:0;}
tr
    {mso-height-source:auto;}
col
    {mso-width-source:auto;}
br
    {mso-data-placement:same-cell;}
td
    {padding-top:1px;
    padding-right:1px;
    padding-left:1px;
    mso-ignore:padding;
    color:black;
    font-size:10.0pt;
    font-weight:400;
    font-style:normal;
    text-decoration:none;
    font-family:Arial, sans-serif;
    mso-font-charset:0;
    mso-number-format:General;
    text-align:general;
    vertical-align:bottom;
    border:none;
    mso-background-source:auto;
    mso-pattern:auto;
    mso-protection:locked visible;
    white-space:nowrap;
    mso-rotate:0;}
.xl65
    {white-space:normal;}
-->
</style>
</head>

<body link="#0563C1" vlink="#954F72">

<table border=0 cellpadding=0 cellspacing=0 width=470 style='border-collapse:
 collapse;width:353pt'>
 <col width=470 style='mso-width-source:userset;mso-width-alt:17188;width:353pt'>
 <tr height=119 style='height:89.25pt'>
<!--StartFragment-->
  <td height=119 class=xl65 width=470 style='height:89.25pt;width:353pt'><font
  class="font5">Vestibulum eget viverra nisi.</font><font class="font0"><br>
    <br>
    Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue
  lectus accumsan risus, non dapibus orci leo a erat.<br>
    <br>
    </font><font class="font6">Suspendisse varius nisi quis metus semper
  dapibus</font><font class="font0">. Cras ullamcorper iaculis tortor eget
  rhoncus. Integer hendrerit vulputate felis</font></td>
<!--EndFragment-->
 </tr>
</table>

</body>

</html>

I have been unable to find a direct way to get this result from the clipboard in VBA for Excel, but I can see the following (admittedly fairly complicated) path to complete the job:

Then, in VBA, in a loop you could define a Range object for each the cells you want to convert, select the cell and copy it, then retrieve the HTML using the c# function and do whatever you need with it.

You might profit from a (late bound) XML spreadsheet analysis of Excel.Range.Value(11) ` returning a "flat" 1-dim array out of a column range input.

Benefit: rng.Value(xlRangeValueXMLSpreadsheet) , aka Value(11) approach should be faster than a character loop.

Further hints

  • This array with all product information can easily joined together using any further delimiters.

  • Note that the xml analysis uses namespace declarations which are part of the returned Range.Value(11) xml string.

  • Having declared a receiving results array ( Dim MyResultsArray ) a possible example call could be MyResultsArray = GetProducts(Sheet1.Range("A1:A50000") .

Function GetProducts(rng As Range)
'[0]Get Value(11)
    Dim s As String
    s = rng.Value(xlRangeValueXMLSpreadsheet)    ' or: rng.Value(11)
    s = Replace(s, "&#10;", "$")                 ' temporary replacement
    '[1]Set xml document to memory
    Dim xDoc As Object: Set xDoc = CreateObject("MSXML2.DOMDocument.6.0")
    '[2]Add namespaces
    xDoc.SetProperty "SelectionNamespaces", _
    "xmlns:ss='urn:schemas-microsoft-com:office:spreadsheet' " & _
    "xmlns:html='http://www.w3.org/TR/REC-html40'"
    If xDoc.LoadXML(s) Then                      ' load wellformed string content
        Dim dat As Object, data As Object
        Set data = xDoc.SelectNodes("//ss:Data") ' XPath using namespace prefixes
        Dim tmp(): ReDim tmp(1 To data.Length)
        For Each dat In data
            Dim r As Long:       r = r + 1
            Dim elems As Object: Set elems = dat.SelectNodes("*")
            Dim html():          ReDim html(0 To elems.Length - 1)
            Dim i As Long
            For i = 0 To elems.Length - 1
                Select Case elems(i).nodename
                Case "Font"
                    html(i) = elems(i).Text
                Case "B"
                    html(i) = "<strong>" & elems(i).Text & "</strong>"
                Case "I"
                    html(i) = "<em>" & elems(i).Text & "</em>"
                Case Else
                    html(i) = "<" & elems(i).nodename & ">" & _
                    elems(i).Text & _
                    "</" & elems(i).nodename & ">"
                End Select
            Next i
            ' join next row elements to one string
            tmp(r) = Replace(Join(html, ""), "$", "<br /><br />")
        Next dat
        '[4]return "flat" array
        GetProducts = tmp
        '        Stop
    End If

End Function

Thank you everyone for taking the time to answer my question, your solutions are all interesting.

I don't know VBA (I'm a web dev) so I asked to a coworker that knows it well and she found a good solution for me ( adapted from this topic ), here is her macro :

Sub ExcelToHTML()
    Dim LastRow As Long
    Dim i As Integer
    Application.ScreenUpdating = False

'Insert Function fnConvert2HTML into B1 cell of WS1
    Sheets("WS1").Select
    Range("B1").FormulaLocal = "=fnConvert2HTML(A1)"

'Find the last row of column A of WS1
    LastRow = Range("A" & Rows.Count).End(xlUp).Row
                                    
'Extend function to the last row
    For i = 1 To LastRow
        Range("B" & i).Select
        ActiveCell.FormulaR1C1 = "=@fnConvert2HTML(RC[-1])"
    Next

'Copy/Paste Column B as values
    Range("B1:B" & LastRow).Copy
    Range("B1").PasteSpecial Paste:=xlPasteValues

'Replace all <br><br> with </p><p>
    Range("B1:B" & LastRow).Replace What:="<br><br>", Replacement:="</p><p>"

End Sub

Function fnConvert2HTML(myCell As Range) As String
    Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn As Boolean
    Dim i, chrCount As Integer
    Dim chrCol, chrLastCol, htmlTxt As String
    
    bldTagOn = False
    itlTagOn = False
    ulnTagOn = False
    colTagOn = False
    htmlTxt = "<p>"
    chrCol = "NONE"
    chrCount = myCell.Characters.Count
    
    For i = 1 To chrCount
        With myCell.Characters(i, 1)
            If (.Font.Color) Then
                chrCol = fnGetCol(.Font.Color)
                If Not colTagOn Then
                    htmlTxt = htmlTxt & "<font color=#" & chrCol & ">"
                    colTagOn = True
                Else
                    If chrCol <> chrLastCol Then htmlTxt = htmlTxt & "</font><font color=#" & chrCol & ">"
                End If
            Else
                chrCol = "NONE"
                If colTagOn Then
                    htmlTxt = htmlTxt & "</font>"
                    colTagOn = False
                End If
            End If
            chrLastCol = chrCol
            
            If .Font.Bold = True Then
                If Not bldTagOn Then
                    htmlTxt = htmlTxt & "<strong>"
                    bldTagOn = True
                End If
            Else
                If bldTagOn Then
                    htmlTxt = htmlTxt & "</strong>"
                    bldTagOn = False
                End If
            End If
    
            If .Font.Italic = True Then
                If Not itlTagOn Then
                    htmlTxt = htmlTxt & "<em>"
                    itlTagOn = True
                End If
            Else
                If itlTagOn Then
                    htmlTxt = htmlTxt & "</em>"
                    itlTagOn = False
                End If
            End If
    
            If .Font.Underline > 0 Then
                If Not ulnTagOn Then
                    htmlTxt = htmlTxt & "<u>"
                    ulnTagOn = True
                End If
            Else
                If ulnTagOn Then
                    htmlTxt = htmlTxt & "</u>"
                    ulnTagOn = False
                End If
            End If
            
            If (Asc(.Text) = 10) Then
                htmlTxt = htmlTxt & "<br>"
            Else
                htmlTxt = htmlTxt & .Text
            End If
        End With
    Next
    
    If colTagOn Then
        htmlTxt = htmlTxt & "</font>"
        colTagOn = False
    End If
    If bldTagOn Then
        htmlTxt = htmlTxt & "</strong>"
        bldTagOn = False
    End If
    If itlTagOn Then
        htmlTxt = htmlTxt & "</em>"
        itlTagOn = False
    End If
    If ulnTagOn Then
        htmlTxt = htmlTxt & "</u>"
        ulnTagOn = False
    End If
    htmlTxt = htmlTxt
    fnConvert2HTML = htmlTxt & "</p>"
    
End Function

Function fnGetCol(strCol As String) As String
    Dim rVal, gVal, bVal As String
    strCol = Right("000000" & Hex(strCol), 6)
    bVal = Left(strCol, 2)
    gVal = Mid(strCol, 3, 2)
    rVal = Right(strCol, 2)
    fnGetCol = rVal & gVal & bVal
End Function

[Source text, in a cell] :

Vestibulum eget viverra nisi.

Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat. here is a simple line break but the stackoverflow editor doesn't display it bold italic text.

Suspendisse varius nisi quis metus semper dapibus. Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis

[Result HTML, in another cell] :

<p><strong>Vestibulum eget viverra nisi.</strong></p>
<p>Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.
<br><strong><em>bold italic text.</strong></em></p>
<p><em>Suspendisse varius nisi quis metus semper dapibus.</em> Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis</p>

The only little problem is the order of these tags for bold italic text :

<strong><em>bold italic text.</strong></em>

It should be :

<strong><em>bold italic text.</em></strong>

Besides, we would like to convert small text too with <small> html tags and my coworker didn't find how to do.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM