简体   繁体   English

将多个 XML 文件解析为 Python 中的一个字典列表

[英]Parse multiple XML files to one list of dictionaries in Python

I have a case that when parsing multiple XML files, actually I want the result of the parsing XML to become a single dictionary list instead of multiple dictionary lists.我有一个案例,在解析多个 XML 文件时,实际上我希望解析 XML 的结果成为单个字典列表而不是多个字典列表。

import glob
from bs4 import BeautifulSoup


def open_xml(filenames):
    for filename in filenames: 
        with open(filename) as fp:
            soup = BeautifulSoup(fp, 'html.parser')
        parse_xml_files(soup)


def parse_xml_files(soup):
    stringToListOfDict = []
    .
    .
    .

    for info in infos:
        dict = {} 
        
        types = info.find_all('type')
        values = info.find_all('value')
        
        for type in types:
            dict[type.attrs['p']] = type.text
      
        stringToListOfDict.append({'Date': Date, 'Time': Time, 'NodeName': node})
        for value in values:
            for result in value.find_all('x'):
                label = dict[result.attrs['y']]
                value = result.text 
                if label:
                    stringToListOfDict[-1][label] = value    

    print(stringToListOfDict)
 
def main():
    open_xml(filenames = glob.glob("*.xml"))

if __name__ == '__main__':
    main() 

With my code above, it always produces two dictionary lists (eg for two XML files) below:使用我上面的代码,它总是在下面生成两个字典列表(例如,对于两个 XML 文件):

[{'Date': '2020-11-19', 'Time': '18:15', 'NodeName': 'LinuxSuSe','Speed': '16'}]
[{'Date': '2020-11-19', 'Time': '18:30', 'NodeName': 'LinuxRedhat','Speed': '16'}]

The desired output should be one list with two dictionaries only:所需的 output 应该是一个只有两个字典的列表:


[{'Date': '2020-11-19', 'Time': '18:15', 'NodeName': 'LinuxSuSe','Speed': '16'},{'Date': '2020-11-19', 'Time': '18:30', 'NodeName':'LinuxRedhat','Speed': '16'}]

Really appreciated your feedback非常感谢您的反馈

print() is used only to send information on screen and it will not join all results in one list. print()仅用于在屏幕上发送信息,它不会将所有结果合并到一个列表中。

Your name parse_xml_files is missleading because it parses single file, not all files.您的名称parse_xml_files具有误导性,因为它解析单个文件,而不是所有文件。 And this function should use return to send result for single file and in open_xml you should get this result add to one list - and then you should have all files in one list.而这个 function 应该使用return来发送单个文件的结果,在open_xml你应该得到这个结果添加到一个列表中 - 然后你应该将所有文件放在一个列表中。

Not tested:未测试:

def open_xml(filenames):

    all_files = []

    for filename in filenames: 
        with open(filename) as fp:
            soup = BeautifulSoup(fp, 'html.parser')
        result = parse_xml_file(soup)  # <-- get result from parse_xml_file
        all_files += result  # <-- append result to list 

    print(all_files)  # <-- display all results
    
def parse_xml_file(soup):
    stringToListOfDict = []

    # ... code ...

    for info in infos:
        dict = {} 
        
        types = info.find_all('type')
        values = info.find_all('value')
        
        for type in types:
            dict[type.attrs['p']] = type.text
      
        stringToListOfDict.append({'Date': Date, 'Time': Time, 'NodeName': node})
        for value in values:
            for result in value.find_all('x'):
                label = dict[result.attrs['y']]
                value = result.text 
                if label:
                    stringToListOfDict[-1][label] = value    

    #print(stringToListOfDict)

    return stringToListOfDict  # <-- send to open_xml

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM