I made a web platform where users can add a long description for a product. The Description field is a wysiwyg editor and the content is saved in HTML in my database (MySQL).
The users rewrote all the descriptions in an Excel file because they have a lot of products (1700+) and they want to do a bulk import of these texts in the database : 1 cell = 1 description = 1 product.
They formatted the texts (bold, italic, underline, paragraphs...) and I must keep that layout when I will import these descriptions in the database. So, I have to convert these texts into HTML language .
Any idea ?
The only way I found is to :
It would be really time-consuming for the 1700+ products (besides, we have multiple languages...).
EDIT :
[Imagine this text in a cell]
Vestibulum eget viverra nisi.
Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.
Suspendisse varius nisi quis metus semper dapibus. Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis
[I want this in a cell]
<p><strong>Vestibulum eget viverra nisi.</strong></p>
<p>Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.</p>
<p><em>Suspendisse varius nisi quis metus semper dapibus.</em> Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis</p>
Or maybe this (replace the line breaks with <br />
tags) :
<strong>Vestibulum eget viverra nisi.</strong><br />
<br />
Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.<br />
<br />
<em>Suspendisse varius nisi quis metus semper dapibus.</em> Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis
https://excelribbon.tips.net/T013402_Finding_Positions_of_Formatted_Characters_in_a_Cell.html shows how one can find the fontstyle of each character in the string using VBA:
Sub Show_Character_Formats()
For i = 1 To Len(Range("A1").Value)
Range("C1").Offset(i, 0).Value = Range("A1").Characters(i, 1).Text
Range("C1").Offset(i, 1).Value = Asc(Range("A1").Characters(i, 1).Text)
Range("C1").Offset(i, 2).Value = Range("A1").Characters(i, 1).Font.FontStyle
Range("C1").Offset(i, 3).Value = Range("A1").Characters(i, 1).Font.Underline
Next i
End Sub
With that capability, is the rest of a solution obvious to you?
This code transforms A1 (with character-specific bolding and italicising) into HTML in A3:
Eg This cell then has bold and then italic text creates:
This cell then has <strong>bold</strong> and then <em>italic </em>text
Sub Format_to_HTML()
Dim Source As String
Dim Target As String
Dim Length As Integer
Source = Range("A1").Value
Length = Len(Source)
For i = 1 To (Length - 1)
Target = Target & Mid(Source, i, 1)
If Range("A1").Characters(i, 1).Font.FontStyle = "Regular" And Range("A1").Characters(i + 1, 1).Font.FontStyle = "Bold" Then
Target = Target & "<strong>"
End If
If Range("A1").Characters(i, 1).Font.FontStyle = "Bold" And Range("A1").Characters(i + 1, 1).Font.FontStyle = "Regular" Then
Target = Target & "</strong>"
End If
If Range("A1").Characters(i, 1).Font.FontStyle = "Regular" And Range("A1").Characters(i + 1, 1).Font.FontStyle = "Italic" Then
Target = Target & "<em>"
End If
If Range("A1").Characters(i, 1).Font.FontStyle = "Italic" And Range("A1").Characters(i + 1, 1).Font.FontStyle = "Regular" Then
Target = Target & "</em>"
End If
Next i
Target = Target & Right(Source, 1)
Range("A3").Value = Target
End Sub
Depending on your particular data you may need to add transitions directly between bold and italic, and to bold AND italic. If non-regular format ever applies at the very beginning or end of the string then you will also need to add some code before or after the loop to insert the relevant tags there.
Then whatever cycling is appropriate to address multiple cells.
With ac# routine, I have been able to determine that when a formatted cell is "copied" in Excel, Excel places an HTML version of it in the Windows clipboard along with some other types like plain text and bitmap graphic.
This is the c# routine to retrieve the HTML from the Windows clipboard (with some help from "Current thread must be set to single thread apartment (STA)" error in copy string to clipboard )
using System.Threading;
namespace SO72854821HTMLfromClipboard
{
public class HTMLfromClipboard
{
public static string getHTMLFromClipboard()
{
string htmlString = null;
Thread STAThread = new Thread(
delegate ()
{
if (System.Windows.Clipboard.ContainsText(System.Windows.TextDataFormat.Html))
{
htmlString = System.Windows.Clipboard.GetText(System.Windows.TextDataFormat.Html);
}
});
STAThread.SetApartmentState(ApartmentState.STA);
STAThread.Start();
STAThread.Join();
return htmlString;
}
}
}
When I enter your formatted text in an Excel cell, copy it with Ctrl-C, and then retrieve the value with this method, the result is an HTML representation of your text:
Version:1.0
StartHTML:0000000148
EndHTML:0000002905
StartFragment:0000002319
EndFragment:0000002845
SourceURL:file:///C:/Temp/SO72854821.xlsm
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/hamkchr/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/hamkchr/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<style>
<!--table
{mso-displayed-decimal-separator:"\,";
mso-displayed-thousand-separator:"\.";}
@page
{margin:.79in .7in .79in .7in;
mso-header-margin:.3in;
mso-footer-margin:.3in;}
.font0
{color:black;
font-size:10.0pt;
font-weight:400;
font-style:normal;
text-decoration:none;
font-family:Arial, sans-serif;
mso-font-charset:0;}
.font5
{color:black;
font-size:10.0pt;
font-weight:700;
font-style:normal;
text-decoration:none;
font-family:Arial, sans-serif;
mso-font-charset:0;}
.font6
{color:black;
font-size:10.0pt;
font-weight:400;
font-style:italic;
text-decoration:none;
font-family:Arial, sans-serif;
mso-font-charset:0;}
tr
{mso-height-source:auto;}
col
{mso-width-source:auto;}
br
{mso-data-placement:same-cell;}
td
{padding-top:1px;
padding-right:1px;
padding-left:1px;
mso-ignore:padding;
color:black;
font-size:10.0pt;
font-weight:400;
font-style:normal;
text-decoration:none;
font-family:Arial, sans-serif;
mso-font-charset:0;
mso-number-format:General;
text-align:general;
vertical-align:bottom;
border:none;
mso-background-source:auto;
mso-pattern:auto;
mso-protection:locked visible;
white-space:nowrap;
mso-rotate:0;}
.xl65
{white-space:normal;}
-->
</style>
</head>
<body link="#0563C1" vlink="#954F72">
<table border=0 cellpadding=0 cellspacing=0 width=470 style='border-collapse:
collapse;width:353pt'>
<col width=470 style='mso-width-source:userset;mso-width-alt:17188;width:353pt'>
<tr height=119 style='height:89.25pt'>
<!--StartFragment-->
<td height=119 class=xl65 width=470 style='height:89.25pt;width:353pt'><font
class="font5">Vestibulum eget viverra nisi.</font><font class="font0"><br>
<br>
Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue
lectus accumsan risus, non dapibus orci leo a erat.<br>
<br>
</font><font class="font6">Suspendisse varius nisi quis metus semper
dapibus</font><font class="font0">. Cras ullamcorper iaculis tortor eget
rhoncus. Integer hendrerit vulputate felis</font></td>
<!--EndFragment-->
</tr>
</table>
</body>
</html>
I have been unable to find a direct way to get this result from the clipboard in VBA for Excel, but I can see the following (admittedly fairly complicated) path to complete the job:
The c# routine would have to be made COM-accessible as described in A Simple C# DLL - how do I call it from Excel, Access, VBA, VB6? and https://social.msdn.microsoft.com/Forums/de-DE/5c64caf4-b1d4-4ced-a579-f8c8e2c5c189/dll-als-verweis-in-excelaccess-vba-nicht-nutzbar?forum=vstode
The DLL of the c# routine would have to be registered on the machine
The DLL has to be added as a reference in Excel (menu item Extras in VBA)
The function has to be declared in VBA with a Declare statement
Then, in VBA, in a loop you could define a Range object for each the cells you want to convert, select the cell and copy it, then retrieve the HTML using the c# function and do whatever you need with it.
You might profit from a (late bound) XML spreadsheet analysis of Excel.Range.Value(11) ` returning a "flat" 1-dim array out of a column range input.
Benefit: rng.Value(xlRangeValueXMLSpreadsheet)
, aka Value(11)
approach should be faster than a character loop.
Further hints
This array with all product information can easily joined together using any further delimiters.
Note that the xml analysis uses namespace declarations which are part of the returned Range.Value(11)
xml string.
Having declared a receiving results array ( Dim MyResultsArray
) a possible example call could be MyResultsArray = GetProducts(Sheet1.Range("A1:A50000")
.
Function GetProducts(rng As Range)
'[0]Get Value(11)
Dim s As String
s = rng.Value(xlRangeValueXMLSpreadsheet) ' or: rng.Value(11)
s = Replace(s, " ", "$") ' temporary replacement
'[1]Set xml document to memory
Dim xDoc As Object: Set xDoc = CreateObject("MSXML2.DOMDocument.6.0")
'[2]Add namespaces
xDoc.SetProperty "SelectionNamespaces", _
"xmlns:ss='urn:schemas-microsoft-com:office:spreadsheet' " & _
"xmlns:html='http://www.w3.org/TR/REC-html40'"
If xDoc.LoadXML(s) Then ' load wellformed string content
Dim dat As Object, data As Object
Set data = xDoc.SelectNodes("//ss:Data") ' XPath using namespace prefixes
Dim tmp(): ReDim tmp(1 To data.Length)
For Each dat In data
Dim r As Long: r = r + 1
Dim elems As Object: Set elems = dat.SelectNodes("*")
Dim html(): ReDim html(0 To elems.Length - 1)
Dim i As Long
For i = 0 To elems.Length - 1
Select Case elems(i).nodename
Case "Font"
html(i) = elems(i).Text
Case "B"
html(i) = "<strong>" & elems(i).Text & "</strong>"
Case "I"
html(i) = "<em>" & elems(i).Text & "</em>"
Case Else
html(i) = "<" & elems(i).nodename & ">" & _
elems(i).Text & _
"</" & elems(i).nodename & ">"
End Select
Next i
' join next row elements to one string
tmp(r) = Replace(Join(html, ""), "$", "<br /><br />")
Next dat
'[4]return "flat" array
GetProducts = tmp
' Stop
End If
End Function
Thank you everyone for taking the time to answer my question, your solutions are all interesting.
I don't know VBA (I'm a web dev) so I asked to a coworker that knows it well and she found a good solution for me ( adapted from this topic ), here is her macro :
Sub ExcelToHTML()
Dim LastRow As Long
Dim i As Integer
Application.ScreenUpdating = False
'Insert Function fnConvert2HTML into B1 cell of WS1
Sheets("WS1").Select
Range("B1").FormulaLocal = "=fnConvert2HTML(A1)"
'Find the last row of column A of WS1
LastRow = Range("A" & Rows.Count).End(xlUp).Row
'Extend function to the last row
For i = 1 To LastRow
Range("B" & i).Select
ActiveCell.FormulaR1C1 = "=@fnConvert2HTML(RC[-1])"
Next
'Copy/Paste Column B as values
Range("B1:B" & LastRow).Copy
Range("B1").PasteSpecial Paste:=xlPasteValues
'Replace all <br><br> with </p><p>
Range("B1:B" & LastRow).Replace What:="<br><br>", Replacement:="</p><p>"
End Sub
Function fnConvert2HTML(myCell As Range) As String
Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn As Boolean
Dim i, chrCount As Integer
Dim chrCol, chrLastCol, htmlTxt As String
bldTagOn = False
itlTagOn = False
ulnTagOn = False
colTagOn = False
htmlTxt = "<p>"
chrCol = "NONE"
chrCount = myCell.Characters.Count
For i = 1 To chrCount
With myCell.Characters(i, 1)
If (.Font.Color) Then
chrCol = fnGetCol(.Font.Color)
If Not colTagOn Then
htmlTxt = htmlTxt & "<font color=#" & chrCol & ">"
colTagOn = True
Else
If chrCol <> chrLastCol Then htmlTxt = htmlTxt & "</font><font color=#" & chrCol & ">"
End If
Else
chrCol = "NONE"
If colTagOn Then
htmlTxt = htmlTxt & "</font>"
colTagOn = False
End If
End If
chrLastCol = chrCol
If .Font.Bold = True Then
If Not bldTagOn Then
htmlTxt = htmlTxt & "<strong>"
bldTagOn = True
End If
Else
If bldTagOn Then
htmlTxt = htmlTxt & "</strong>"
bldTagOn = False
End If
End If
If .Font.Italic = True Then
If Not itlTagOn Then
htmlTxt = htmlTxt & "<em>"
itlTagOn = True
End If
Else
If itlTagOn Then
htmlTxt = htmlTxt & "</em>"
itlTagOn = False
End If
End If
If .Font.Underline > 0 Then
If Not ulnTagOn Then
htmlTxt = htmlTxt & "<u>"
ulnTagOn = True
End If
Else
If ulnTagOn Then
htmlTxt = htmlTxt & "</u>"
ulnTagOn = False
End If
End If
If (Asc(.Text) = 10) Then
htmlTxt = htmlTxt & "<br>"
Else
htmlTxt = htmlTxt & .Text
End If
End With
Next
If colTagOn Then
htmlTxt = htmlTxt & "</font>"
colTagOn = False
End If
If bldTagOn Then
htmlTxt = htmlTxt & "</strong>"
bldTagOn = False
End If
If itlTagOn Then
htmlTxt = htmlTxt & "</em>"
itlTagOn = False
End If
If ulnTagOn Then
htmlTxt = htmlTxt & "</u>"
ulnTagOn = False
End If
htmlTxt = htmlTxt
fnConvert2HTML = htmlTxt & "</p>"
End Function
Function fnGetCol(strCol As String) As String
Dim rVal, gVal, bVal As String
strCol = Right("000000" & Hex(strCol), 6)
bVal = Left(strCol, 2)
gVal = Mid(strCol, 3, 2)
rVal = Right(strCol, 2)
fnGetCol = rVal & gVal & bVal
End Function
[Source text, in a cell] :
Vestibulum eget viverra nisi.
Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat. here is a simple line break but the stackoverflow editor doesn't display it
bold italic text.
Suspendisse varius nisi quis metus semper dapibus. Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis
[Result HTML, in another cell] :
<p><strong>Vestibulum eget viverra nisi.</strong></p>
<p>Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.
<br><strong><em>bold italic text.</strong></em></p>
<p><em>Suspendisse varius nisi quis metus semper dapibus.</em> Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis</p>
The only little problem is the order of these tags for bold italic text :
<strong><em>bold italic text.</strong></em>
It should be :
<strong><em>bold italic text.</em></strong>
Besides, we would like to convert small text too with <small>
html tags and my coworker didn't find how to do.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.