[英]Parse data from .xml file and save it to .tsv file in Python
I have a dataset which looks like this:我有一个看起来像这样的数据集:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Reviews>
<Review rid="1004293">
<sentences>
<sentence id="1004293:0">
<text>Judging from previous posts this used to be a good place, but not any longer.</text>
<Opinions>
<Opinion target="place" category="RESTAURANT#GENERAL" polarity="negative" from="51" to="56"/>
</Opinions>
</sentence>
<sentence id="1004293:1">
<text>The food here is rather good, but only if you like to wait for it.</text>
<Opinions>
<Opinion target="food" category="FOOD#QUALITY" polarity="positive" from="4" to="8"/>
<Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
</Opinions>
</sentence>
...
How can I parse the data from this .xml file to .tsv file in the following format:如何将数据从这个 .xml 文件解析为以下格式的 .tsv 文件:
["negative", "Judging from previous posts this used to be a good place, but not any longer.", "RESTAURANT#GENERAL"] ["否定", "从以前的帖子来看,这曾经是一个好地方,但不再是了。", "RESTAURANT#GENERAL"]
["positive", "The food here is rather good, but only if you like to wait for it.","FOOD#QUALITY"] ["积极", "这里的食物还不错,但前提是你喜欢等。","FOOD#QUALITY"]
["negative", "The food here is rather good, but only if you like to wait for it.","SERVICE#GENERAL"] ["否定", "这里的食物还不错,但前提是你喜欢等。","SERVICE#GENERAL"]
Thanks!谢谢!
You can use elementtree package of python to get your desired output.您可以使用 python 的 elementtree 包来获得所需的输出。 Below is the code which will print your list.
下面是将打印您的列表的代码。 You can create a tsv by replacing the print and writing to a tsv file.
您可以通过替换打印并写入 tsv 文件来创建 tsv。
The sample.xml file must be present in the same directory where this code is present. sample.xml 文件必须位于此代码所在的同一目录中。
from xml.etree import ElementTree
file = 'sample.xml'
tree = ElementTree.parse(file)
root = tree.getroot()
for sentence in root.iter('sentence'):
# Loop all sentence in the xml
for opinion in sentence.iter('Opinion'):
# Loop all Opinion of a particular sentence.
print([opinion.attrib['polarity'], sentence.find('text').text, opinion.attrib['category']])
Output:输出:
['negative', 'Judging from previous posts this used to be a good place, but not any longer.', 'RESTAURANT#GENERAL']
['positive', 'The food here is rather good, but only if you like to wait for it.', 'FOOD#QUALITY']
['negative', 'The food here is rather good, but only if you like to wait for it.', 'SERVICE#GENERAL']
sample.xml contains: sample.xml 包含:
<Reviews>
<Review rid="1004293">
<sentences>
<sentence id="1004293:0">
<text>Judging from previous posts this used to be a good place, but not any longer.</text>
<Opinions>
<Opinion target="place" category="RESTAURANT#GENERAL" polarity="negative" from="51" to="56"/>
</Opinions>
</sentence>
<sentence id="1004293:1">
<text>The food here is rather good, but only if you like to wait for it.</text>
<Opinions>
<Opinion target="food" category="FOOD#QUALITY" polarity="positive" from="4" to="8"/>
<Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
</Opinions>
</sentence>
</sentences>
</Review>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.