简体   繁体   English

从单元格字符串中删除HTML标记:excel Formula

[英]remove HTML tags from cell strings : excel Formula

I have a Data with HTML tags in excel sheet like below: 我在excel表中有一个带HTML标签的数据,如下所示:

<b>This is test data<br>Nice
<div> Go on this is next Cell
Very goood <b>.....</b>

SO, Basically I want to delete or replace all html tags with space in excel sheet. 所以,基本上我想在Excel工作表中删除或替换所有带有空格的html标签。

Apply Replace All with <*> pattern: 使用<*>模式应用Replace All


To open this go to Ribbon Home > Find & Select > Replace... or simply press CTRL + H . 要打开此功能,请转到功能区Home > Find & Select > Replace...或只需按CTRL + H.

Extra spaces may be further removed using TRIM function. 使用TRIM功能可以进一步移除额外的空间。 Good luck! 祝好运!

Open VBA in Excel (Alt +F11), click on the project name (spreadsheet name) in the project explorer on the right. 在Excel中打开VBA(Alt + F11),单击右侧项目浏览器中的项目名称(电子表格名称)。 Insert -> New Module. 插入 - >新模块。 Paste the user defined function below into the module Window. 将用户定义的函数粘贴到模块窗口中。 Save as .XLSM which allow Macros. 保存为允许宏的.XLSM。

type the function '=StripHTML(A2)' assuming your data is in cell A2. 假设您的数据在单元格A2中,键入函数'= StripHTML(A2)'。 You can also download a working example here: 你也可以在这里下载一个工作示例:

http://jfrancisconsulting.com/how-to-strip-html-tags-in-excel/ http://jfrancisconsulting.com/how-to-strip-html-tags-in-excel/

Function StripHTML(cell As Range) As String
    Dim RegEx As Object
    Set RegEx = CreateObject("vbscript.regexp")
    Dim sInput As String
    Dim sOut As String
    sInput = cell.Text

    sInput = Replace(sInput, "\x0D\x0A", Chr(10))
    sInput = Replace(sInput, "\x00", Chr(10))

    'replace HTML breaks and end of paragraphs with line breaks
    sInput = Replace(sInput, "</P>", Chr(10) & Chr(10))
    sInput = Replace(sInput, "<BR>", Chr(10))

    'replace bullets with dashes
    sInput = Replace(sInput, "<li>", "-")

    'add back all of the special characters
    sInput = Replace(sInput, "&ndash;", "–")
    sInput = Replace(sInput, "&mdash;", "—")
    sInput = Replace(sInput, "&iexcl;", "¡")
    sInput = Replace(sInput, "&iquest;", "¿")
    sInput = Replace(sInput, "&quot;", "")
    sInput = Replace(sInput, "&ldquo;", "")
    sInput = Replace(sInput, "&rdquo;", "")
    sInput = Replace(sInput, "", "'")
    sInput = Replace(sInput, "&lsquo;", "'")
    sInput = Replace(sInput, "&rsquo;", "’")
    sInput = Replace(sInput, "&laquo;", "«")
    sInput = Replace(sInput, "&raquo;", "»")
    sInput = Replace(sInput, "&nbsp;", " ")
    sInput = Replace(sInput, "&amp;", "&")
    sInput = Replace(sInput, "&cent;", "¢")
    sInput = Replace(sInput, "&copy;", "©")
    sInput = Replace(sInput, "&divide;", "÷")
    sInput = Replace(sInput, "&gt;", ">")
    sInput = Replace(sInput, "&lt;", "<")
    sInput = Replace(sInput, "&micro;", "µ")
    sInput = Replace(sInput, "&middot;", "·")
    sInput = Replace(sInput, "&para;", "¶")
    sInput = Replace(sInput, "&plusmn;", "±")
    sInput = Replace(sInput, "&euro;", "€")
    sInput = Replace(sInput, "&pound;", "£")
    sInput = Replace(sInput, "&reg;", "®")
    sInput = Replace(sInput, "&sect;", "§")
    sInput = Replace(sInput, "&trade;", "™")
    sInput = Replace(sInput, "&yen;", "¥")
    sInput = Replace(sInput, "&aacute;", "á")
    sInput = Replace(sInput, "&Aacute;", "Á")
    sInput = Replace(sInput, "&agrave;", "à")
    sInput = Replace(sInput, "&Agrave;", "À")
    sInput = Replace(sInput, "&acirc;", "â")
    sInput = Replace(sInput, "&Acirc;", "Â")
    sInput = Replace(sInput, "&aring;", "å")
    sInput = Replace(sInput, "&Aring;", "Å")
    sInput = Replace(sInput, "&atilde;", "ã")
    sInput = Replace(sInput, "&Atilde;", "Ã")
    sInput = Replace(sInput, "&auml;", "ä")
    sInput = Replace(sInput, "&Auml;", "Ä")
    sInput = Replace(sInput, "&aelig;", "æ")
    sInput = Replace(sInput, "&AElig;", "Æ")
    sInput = Replace(sInput, "&ccedil;", "ç")
    sInput = Replace(sInput, "&Ccedil;", "Ç")
    sInput = Replace(sInput, "&eacute;", "é")
    sInput = Replace(sInput, "&Eacute;", "É")
    sInput = Replace(sInput, "&egrave;", "è")
    sInput = Replace(sInput, "&Egrave;", "È")
    sInput = Replace(sInput, "&ecirc;", "ê")
    sInput = Replace(sInput, "&Ecirc;", "Ê")
    sInput = Replace(sInput, "&euml;", "ë")
    sInput = Replace(sInput, "&Euml;", "Ë")
    sInput = Replace(sInput, "&iacute;", "í")
    sInput = Replace(sInput, "&Iacute;", "Í")
    sInput = Replace(sInput, "&igrave;", "ì")
    sInput = Replace(sInput, "&Igrave;", "Ì")
    sInput = Replace(sInput, "&icirc;", "î")
    sInput = Replace(sInput, "&Icirc;", "Î")
    sInput = Replace(sInput, "&iuml;", "ï")
    sInput = Replace(sInput, "&Iuml;", "Ï")
    sInput = Replace(sInput, "&ntilde;", "ñ")
    sInput = Replace(sInput, "&Ntilde;", "Ñ")
    sInput = Replace(sInput, "&oacute;", "ó")
    sInput = Replace(sInput, "&Oacute;", "Ó")
    sInput = Replace(sInput, "&ograve;", "ò")
    sInput = Replace(sInput, "&Ograve;", "Ò")
    sInput = Replace(sInput, "&ocirc;", "ô")
    sInput = Replace(sInput, "&Ocirc;", "Ô")
    sInput = Replace(sInput, "&oslash;", "ø")
    sInput = Replace(sInput, "&Oslash;", "Ø")
    sInput = Replace(sInput, "&otilde;", "õ")
    sInput = Replace(sInput, "&Otilde;", "Õ")
    sInput = Replace(sInput, "&ouml;", "ö")
    sInput = Replace(sInput, "&Ouml;", "Ö")
    sInput = Replace(sInput, "&szlig;", "ß")
    sInput = Replace(sInput, "&uacute;", "ú")
    sInput = Replace(sInput, "&Uacute;", "Ú")
    sInput = Replace(sInput, "&ugrave;", "ù")
    sInput = Replace(sInput, "&Ugrave;", "Ù")
    sInput = Replace(sInput, "&ucirc;", "û")
    sInput = Replace(sInput, "&Ucirc;", "Û")
    sInput = Replace(sInput, "&uuml;", "ü")
    sInput = Replace(sInput, "&Uuml;", "Ü")
    sInput = Replace(sInput, "&yuml;", "ÿ")
    sInput = Replace(sInput, "", "´")
    sInput = Replace(sInput, "", "`")

    'replace all the remaining HTML Tags
    With RegEx
    .Global = True
    .IgnoreCase = True
    .MultiLine = True
    .Pattern = "<[^>]+>" 'Regular Expression for HTML Tags.

    End With
    sOut = RegEx.Replace(sInput, "")
    StripHTML = sOut
    Set RegEx = Nothing
    End Function

Since the macro above didn't work for me I fixed it myself. 由于上面的宏对我不起作用,我自己修复了。 It's my first script, if you guys can improve it, make it faster, add more then you're more than welcome! 这是我的第一个剧本,如果你们可以改进它们,让它更快,添加更多,那么你们非常欢迎!

Ok guys, I've had no previous experience programming (except for some very basic Java 6 years ago) but with some help, lots of guessing (hours actually) I managed to make this script, it works like a charm to remove most and 8#text but it does not replace <BR> with linebreak (you can do this by hitting CTRL + H, "find: <br> " "replace: (now hold ALT down and use type 0010 with your NUMPAD. A small dot should be blinking in the replace window, then hit "replace all"). 好吧,我以前没有编程经验(除了6年前的一些非常基本的Java),但是在一些帮助下,很多猜测(实际上是小时)我设法制作这个脚本,它就像一个魅力去除大多数和8#文本,但它不会用换行符替换<BR> (你可以通过按CTRL + H来执行此操作,“find: <br> ”“替换:(现在按住ALT并使用类型0010与你的NUMPAD。一个小点应该在替换窗口中闪烁,然后点击“全部替换”)。

Paste the code below into a user module (alt +f11, right click Sheet1->insert->Module->paste code) 将下面的代码粘贴到用户模块中(alt + f11,右键单击Sheet1-> insert-> Module-> paste code)

And make a button by going File->Options->Customize Ribbon-> check the Developer checkbox. 然后通过File-> Options-> Customize Ribbon->选中Developer复选框来创建一个按钮。 Then go to developer tab->Insert->Button-> then place the button and right click->assign macro-> Choose RemoveTags. 然后转到开发人员选项卡 - >插入 - >按钮 - >然后放置按钮并右键单击 - >指定宏 - >选择删除标记。

Sub RemoveTags()
    Dim r As Range

    Selection.NumberFormat = "@"  'set cells to text numberformat

    With CreateObject("vbscript.regexp")
      .Pattern = "\<.*?\>"
      .Global = True

      For Each r In Selection
        r.Value = Replace(.Replace(r.Value, ""), "&#8217;", " ")
        r.Value2 = Replace(.Replace(r.Value2, ""), "&#8211;", " ")
      Next r

      For Each r In Selection
        r.Value = Replace(.Replace(r.Value, ""), "&#8216;", " ")
        r.Value2 = Replace(.Replace(r.Value2, ""), "&#8232;", " ")
      Next r

      For Each r In Selection
        r.Value = Replace(.Replace(r.Value, ""), "&#8233;", " ")
        r.Value2 = Replace(.Replace(r.Value2, ""), "&#146;s", " ")
      Next r
    End With
End Sub

Private Sub CommandButton1_Click()

End Sub

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM