Excel：导出到 XML - 单元格中包含 XML

Question

I'm trying to export a spreadsheet that has some XML in some of the cells of the table.我正在尝试导出在表格的某些单元格中包含一些 XML 的电子表格。

ID (column A): 23455 ID（A栏）：23455

FACT (column B) (this code is copied & pasted from a sample cell - they don't all have this simplicity or structure):事实（B 列）（此代码是从示例单元格中复制和粘贴的——它们并非都具有这种简单性或结构）：

"<div class=""fact"">
<p><strong>FACT.</strong> The closest star to our solar system is Alpha Centauri.</p>
</div>
"

I'd like to have XML like the following:我想要 XML 如下所示：

<record>
    <ID>23455</ID>
    <FACT><div class="fact"><p><strong>FACT.</strong> The closest star to our solar system is Alpha Centauri.</p></div></FACT>
</record>

This is complex enough that I doubt that Excel's native XML schema export will work (that thing is persnickety enough that I can't get it to work with simplest of data values).这足够复杂，我怀疑 Excel 的本机 XML 模式导出是否会起作用（这件事太挑剔了，我无法让它与最简单的数据值一起工作）。

My current thought is to write a Perl script, to read this as a CSV file and export XML.我目前的想法是编写一个 Perl 脚本，将其读取为 CSV 文件并导出 XML。 However, I've noticed that CSV does a poor job handling XML that's been "embedded" like this.但是，我注意到 CSV 在处理像这样“嵌入”的 XML 方面做得很差。

I'm hoping someone else might have a better suggestion for how to pull this information out.我希望其他人可能对如何提取这些信息有更好的建议。

Edit: Finally figured out the mistake I made with export.编辑：终于弄清楚了我在导出时犯的错误。 Can export and get the following:可以导出并得到以下内容：

<record>
    <ID>23455</ID>
    <FACT>&lt;div class="fact"&gt;&lt;p&gt;&lt;strong&gt;FACT.&lt;/strong&gt; The closest star to our solar system is Alpha Centauri.&lt;/p&gt;&lt;/div&gt
    </FACT>
</record>

I think I can work with this...some regex and it might be good enough (looking for all < might put me at risk of killing a true less-than sign).我想我可以使用这个……一些正则表达式，它可能已经足够好了（寻找所有的<可能会让我面临杀死一个真正的小于号的风险）。

So I'm still open to suggestions所以我仍然愿意接受建议

Answer 1

Just posting this as the answer...只是将其发布为答案...

If you export the column as text you can get the following:如果将列导出为文本，您可以获得以下信息：

<record>
    <ID>23455</ID>
    <FACT>&lt;div class="fact"&gt;&lt;p&gt;&lt;strong&gt;FACT.&lt;/strong&gt; The closest star to our solar system is Alpha Centauri.&lt;/p&gt;&lt;/div&gt
    </FACT>
</record>

In an XML editor I did a find and replace to get all the tags using the following regex: s/<(\/?[\w\s="-_]+?)>/<$1>/在 XML 编辑器中，我使用以下正则表达式进行了查找和替换以获取所有标签： s/<(\/?[\w\s="-_]+?)>/<$1>/

It's a bit dangerous if there are actual signs in the document, but you'd need a case where it was < /maybe and text with common tag symbols ="-_ > - possible but most equations are of the form X < Y < Z . Our content doesn't use <> all that much, so I can be fairly confident it won't catch the edge case.如果文档中有实际的标志，那就有点危险了，但是你需要一个例子，它是< /maybe and text with common tag symbols ="-_ > - 可能但大多数方程的形式是X < Y < Z . 我们的内容并没有太多地使用 <>，所以我可以相当确信它不会遇到边缘情况。

I also "fixed" all the HTML ( s/<b>/<b/>/ and s/<img (.*?)>/<img $1/>/ ) and checked parsing (theoretically an edge case would cause a parsing error).我还“修复”了所有 HTML （ s/<b>/<b/>/和s/<img (.*?)>/<img $1/>/ ）并检查了解析（理论上边缘情况会导致解析错误）。

And yes, I now have a doc in mixed DTD that will make all true XML peeps quake with horror, but I can work with it.是的，我现在有一个混合 DTD 的文档，它可以让所有真正的 XML 惊恐万分，但我可以使用它。

Excel：导出到 XML - 单元格中包含 XML

问题描述

1 个解决方案

解决方案1
1 2021-05-11 12:22:00

Excel：导出到 XML - 单元格中包含 XML

问题描述

1 个解决方案

解决方案1 1 2021-05-11 12:22:00

解决方案1
1 2021-05-11 12:22:00