簡體   English   中英

如何在python中解析xml文件?

[英]How to parse an xml file in python?

我有一個xml文件,看起來像這樣

<?xml version='1.0' encoding='UTF8'?>
<Reviews>
  <Review rid="0" book_title="O-Apanhador-no-Campo-de-Centeio" score="4.0">
    <sentences>
      <sentence id="0:0:0" place="title" polarity="neutral">
        <text>Está provado:</text>
        <tokens>
          <word id="1" form="Está" base="estar" postag="v-fin" morf="PR 3S IND VFIN" extra="fmc * vK mv" head="0" deprel="STA" srl="PRED" obj="O" opinion="O" from="0" to="4"/>
          <word id="2" form="provado" base="provar" postag="v-fin" morf="PCP M S" extra="vH jh" head="1" deprel="Cs" srl="ATR" obj="O" opinion="O" from="5" to="12"/>
          <word id="3" form=":" base="--" postag="pu" morf="--" extra="--" head="0" deprel="PU" srl="" obj="O" opinion="O" from="12" to="13"/>
        </tokens>
      </sentence>
      <sentence id="0:0:1" place="title" polarity="neutral">
        <text>Pode existir um livro bom sem uma história boa.</text>
        <tokens>
          <word id="1" form="Pode" base="poder" postag="v-fin" morf="PR 3S IND VFIN" extra="fmc * aux" head="0" deprel="STA" srl="" obj="O" opinion="O" from="0" to="4"/>
          <word id="2" form="existir" base="existir" postag="v-inf" morf="--" extra="mv" head="1" deprel="Oaux" srl="PRED" obj="O" opinion="O" from="5" to="12"/>
          <word id="3" form="um" base="um" postag="pron-indef" morf="M S" extra="--" head="4" deprel="DN" srl="" obj="O" opinion="O" from="13" to="15"/>
          <word id="4" form="livro" base="livro" postag="n" morf="M S" sem="sem-r" extra="--" head="1" deprel="S" srl="TH" obj="O" opinion="O" from="16" to="21"/>
          <word id="5" form="bom" base="bom" postag="adj" morf="M S" extra="np-close" head="4" deprel="DN" srl="" obj="O" opinion="O" from="22" to="25"/>
          <word id="6" form="sem" base="sem" postag="prp" morf="--" extra="--" head="2" deprel="fA" srl="" obj="O" opinion="O" from="26" to="29"/>
          <word id="7" form="uma" base="um" postag="pron-indef" morf="F S" extra="--" head="8" deprel="DN" srl="" obj="O" opinion="O" from="30" to="33"/>
          <word id="8" form="história" base="história" postag="n" morf="F S" sem="per domain sem-r" extra="--" head="6" deprel="DP" srl="COM-ADV" obj="O" opinion="O" from="34" to="42"/>
          <word id="9" form="boa" base="bom" postag="adj" morf="F S" extra="jh np-close" head="8" deprel="DN" srl="" obj="O" opinion="O" from="43" to="46"/>
          <word id="10" form="." base="--" postag="pu" morf="--" extra="--" head="0" deprel="PU" srl="" from="46" to="47"/>
        </tokens>

我想將文本字段和極性提取到單獨的csv文件中。

我用它成功提取了極性,但是我無法提取文本

with open('output1.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(('text', 'polarity'))
    root = lxml.etree.fromstring(xmlstr)
    for sent in root.iter('sentence'):
        row = sent.get('text'), sent.get('polarity')
        writer.writerow(row)

其中xmlstr是xml文件內容的字符串。

如何從文件!中提取文本字段?

注意:這是一個鏈接,其中包含我正在使用葡萄牙語進行情感分析的文件

有誰可以幫忙!?

謝謝

嘗試這種方式:

import xml.etree.ElementTree
import csv
e = xml.etree.ElementTree.parse('ReLiPalavras.xml').getroot()
with open('output1.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(('text', 'polarity'))
    for sent in e.iter('sentence'):
        row = sent[0].text.encode('utf-8'), sent.get('polarity')
        writer.writerow(row)

然后,您將在output1.csv文件中獲得text元素的內容和屬性polarity

我遵循以下解決方案:

trainset = list()
xmldoc = etree.parse('ReLiPalavras.xml')

for sentence_node in xmldoc.iter('sentence'):
    sentence = list()
    #for word_node in sentence_node.iter('word'):
    #    tag = 'O'
    #    if word_node.get('obj') != 'O':
    #        tag = 'OBJ'
    sentence.append({
        'sent': sentence_node[0].text,
        'polarity': sentence_node.get('polarity')})
    if len(sentence) != 0:
        trainset.append(sentence)

這創建了詞典列表。

with open('names.csv', 'w', encoding='utf-8') as csvfile:
    fieldnames = ['sent', 'polarity']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter=',')

    writer.writeheader()
    for d in trainset:
        writer.writerow(d[0])

然后將其傳遞到此csv文件中

而這正是我想要的

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM