简体   繁体   English

解析 .xml 文件中的数据并将其保存到 Python 中的 .tsv 文件

[英]Parse data from .xml file and save it to .tsv file in Python

I have a dataset which looks like this:我有一个看起来像这样的数据集:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Reviews>
<Review rid="1004293">
    <sentences>
        <sentence id="1004293:0">
            <text>Judging from previous posts this used to be a good place, but not any longer.</text>
            <Opinions>
                <Opinion target="place" category="RESTAURANT#GENERAL" polarity="negative" from="51" to="56"/>
            </Opinions>
        </sentence>
        <sentence id="1004293:1">
            <text>The food here is rather good, but only if you like to wait for it.</text>
            <Opinions>
                <Opinion target="food" category="FOOD#QUALITY" polarity="positive" from="4" to="8"/>
                <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
            </Opinions>
        </sentence>
...

How can I parse the data from this .xml file to .tsv file in the following format:如何将数据从这个 .xml 文件解析为以下格式的 .tsv 文件:

["negative", "Judging from previous posts this used to be a good place, but not any longer.", "RESTAURANT#GENERAL"] ["否定", "从以前的帖子来看,这曾经是一个好地方,但不再是了。", "RESTAURANT#GENERAL"]

["positive", "The food here is rather good, but only if you like to wait for it.","FOOD#QUALITY"] ["积极", "这里的食物还不错,但前提是你喜欢等。","FOOD#QUALITY"]

["negative", "The food here is rather good, but only if you like to wait for it.","SERVICE#GENERAL"] ["否定", "这里的食物还不错,但前提是你喜欢等。","SERVICE#GENERAL"]

Thanks!谢谢!

You can use elementtree package of python to get your desired output.您可以使用 python 的 elementtree 包来获得所需的输出。 Below is the code which will print your list.下面是将打印您的列表的代码。 You can create a tsv by replacing the print and writing to a tsv file.您可以通过替换打印并写入 tsv 文件来创建 tsv。

The sample.xml file must be present in the same directory where this code is present. sample.xml 文件必须位于此代码所在的同一目录中。

from xml.etree import ElementTree

file = 'sample.xml'

tree = ElementTree.parse(file)
root = tree.getroot()

for sentence in root.iter('sentence'):
# Loop all sentence in the xml
    for opinion in sentence.iter('Opinion'):
    # Loop all Opinion of a particular sentence.
        print([opinion.attrib['polarity'], sentence.find('text').text, opinion.attrib['category']])

Output:输出:

['negative', 'Judging from previous posts this used to be a good place, but not any longer.', 'RESTAURANT#GENERAL']                                                     
['positive', 'The food here is rather good, but only if you like to wait for it.', 'FOOD#QUALITY']                                                                      
['negative', 'The food here is rather good, but only if you like to wait for it.', 'SERVICE#GENERAL'] 

sample.xml contains: sample.xml 包含:

<Reviews>
<Review rid="1004293">
    <sentences>
        <sentence id="1004293:0">
            <text>Judging from previous posts this used to be a good place, but not any longer.</text>
            <Opinions>
                <Opinion target="place" category="RESTAURANT#GENERAL" polarity="negative" from="51" to="56"/>
            </Opinions>
        </sentence>
        <sentence id="1004293:1">
            <text>The food here is rather good, but only if you like to wait for it.</text>
            <Opinions>
                <Opinion target="food" category="FOOD#QUALITY" polarity="positive" from="4" to="8"/>
                <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
            </Opinions>
        </sentence>
    </sentences>
</Review>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM