简体   繁体   English

ElementTree解析xml文件-解析问题

[英]ElementTree parse xml file - problem with parsing

I have a problem parsing data from xml file. 我从xml文件解析数据时遇到问题。 I'm using xml.etree.ElementTree to extract data from files and then save them into .csv. 我正在使用xml.etree.ElementTree从文件中提取数据,然后将它们保存到.csv中。 I have all the necessery modules installed on server. 我在服务器上安装了所有必需的模块。 I am aware that there is bs4 module with BeutifulSoup, yet I would like to know if is possible to parse this data/xml file using ElementTree . 我知道BeutifulSoup有bs4模块, 但是我想知道是否可以使用ElementTree解析此data / xml文件 Sorry if the answear is easy or obvious, yet I'm still very much a beginner and with this problem I could not name the problem in a way to find an answear. 很抱歉,如果这个answear很简单或很明显,但是我仍然是一个初学者,由于这个问题,我无法以找到answear的方式来命名问题。

While running python script written below I have no errors and no outcome. 运行下面编写的python脚本时,我没有错误,也没有结果。 I don't really know what should I change. 我真的不知道该改变什么。 I can not find solution. 我找不到解决方法。 I tried using different child.tag or attributes but with no result. 我尝试使用其他child.tag或属性,但没有结果。

The xml file that I have problem with.: 我有问题的xml文件:

<?xml version="1.0" encoding="utf-8"?>
<offer file_format="IOF" version="2.6" extensions="yes" xmlns="http://www.iai-shop.com/developers/iof.phtml">
    <product id="9" vat="23.0" code_on_card="BHA">
      <producer id="1308137276" name="BEAL"/>
      ...
      <price gross="175" net="142.28"/>
      <sizes>
        <size code_producer="3700288265272" code="9-uniw" weight="0">
          <stock id="0" quantity="-1"/>
          <stock id="1" quantity="4"/>
        </size>
      </sizes>
    </product>
    <product>
              ...
    </product>
              ...

and script that I tried to use (here to extract code_on_card, price net, quantity). 和我尝试使用的脚本(此处提取code_on_card,价格网,数量)。

(I am aware that there are two childs: stock and quantity, and I'm completely fine with the second one overwrting the first one) (我知道有两个孩子:库存和数量,第二个覆盖了第一个,我完全没问题)

import requests
import os,sys
import csv
import xml.etree.ElementTree as ET

reload(sys)
sys.setdefaultencoding('utf-8')

xml_path = '/file.xml'

xml = ET.parse(xml_path)

with open('/home/file.csv', 'wb') as f:
    c = csv.writer(f, delimiter=';')
    for product in xml.iter('product'):
    product_id = product.attrib["code_on_card"]
        for child in product:
            if child.tag == 'price':
                if child.attrib["net"] != None:
                    hurt_net = child.attrib["net"]
        for size in product.iter('size'):
            for stock in size.iter('stock'):
                if 'quantity' in stock.attrib.keys():
                    quantity = stock.attrib["quantity"]

        line = product_id, hurt_net, quantity
        c.writerow(line)

Files that seem to me to be built on similar scheme work just fine (offer -> product ->child/attrib ), like this one: 在我看来,基于类似方案构建的文件可以很好地工作(提供->产品-> child / attrib),就像这样:

<?xml version="1.0" encoding="UTF-8"?>
<offer file_format="IOF" version="2.5">
    <product id="2">
        <price gross="0.00" net="0.00" vat="23.0"/>
        <srp gross="0.00" net="0" vat="23.0"/>
        <sizes>
            <size id="0"  code="2-0"  weight="0" >
            </size>
        </sizes>
    </product>
        ...
    </product>
        ...

EDIT: Outcome should be .csv file containing multpile rows (each for each product in xml file) of code_on_card, price net, quantity. 编辑:结果应为.csv文件,其中包含code_on_card,价格网,数量的多行(每行xml文件中的每个产品)。 It should look like: 它应该看起来像:

BC097B.50GD.O;70.81;37
BC097B.50.A;76.75;24
BC086C.50.B;76.75;29
BGRT.L;3;96.75;28
....

EDIT2 code as it is, after drec4s answear: 在drec4s answear之后,按原样的EDIT2代码:

import requests
import os,sys
import csv
import xml.etree.cElementTree as ET

reload(sys)
sys.setdefaultencoding('utf-8')

xml_path = '/home/platne/serwer16373/dane/z_hurtowni/pobrane/beal2.xml'

root = ET.parse(xml_path)

ns = {'offer': 'http://www.iai-shop.com/developers/iof.phtml'}

products = root.getchildren()

with open('/home/platne/serwer16373/dane/z_hurtowni/stany_magazynowe/karol/bealKa.csv', 'wb') as f:
    c = csv.writer(f, delimiter=';')
    hurtownia = 'beal'
    for product in root.iter('product'):
        qtt = [1]
        code = product.get('code_on_card')
        hurt_net = product.find('price').get('net')
        for stock in product.find('sizes').find('size').getchildren():
            qtt.append(stock.get('quantity'))
        quantity = max(qtt)


        line = 'beal-'+str(code), hurt_net, quantity
        c.writerow(line)

somehow I'm getting AttributeError: 'ElementTree' object has no attribute 'getchildren' I've got Ele 我以某种方式得到AttributeError:'ElementTree'对象没有属性'getchildren'我得到了Ele

This is how I would go and parse an xml file with namespaces. 这就是我要去解析带有名称空间的xml文件的方式。 As per official documentation , the easiest way is to define a dictionary specifying the namespace. 根据官方文档 ,最简单的方法是定义一个指定名称空间的dictionary

from xml.etree import cElementTree as ET

root = ET.fromstring("""
<offer file_format="IOF" version="2.6" extensions="yes" xmlns="http://www.iai-shop.com/developers/iof.phtml">
    <product id="9" vat="23.0" code_on_card="BHA">
      <producer id="1308137276" name="BEAL"/>
      <price gross="175" net="142.28"/>
      <sizes>
        <size code_producer="3700288265272" code="9-uniw" weight="0">
          <stock id="0" quantity="-1"/>
          <stock id="1" quantity="4"/>
        </size>
      </sizes>
    </product>
</offer>
""")

ns = {'offer': 'http://www.iai-shop.com/developers/iof.phtml'}

products = root.getchildren()

for p in products:
    qtt = [] #to store all stock quantities
    product_id = p.get('code_on_card')
    hurt_net = p.find('offer:price', ns).get('net')
    for stock in p.find('offer:sizes', ns).find('offer:size', ns).getchildren():
        qtt.append(int(stock.get('quantity')))

    quantity = max(qtt) #or sum

line = (product_id, hurt_net, quantity)
print(line)

Outputs: 输出:

('BHA', '142.28', 4)

Also, I did not understand what was the stock quantity that you needed to extract, since you were only getting the last children( stock ) value (change the sum function to max or to whatever you need). 另外,我不明白您需要提取的库存量是多少,因为您仅获得了最后一个child( stock )值(将sum函数更改为max或所需的任何值)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM