简体   繁体   English

用 Python 将 XML 解析为 CSV

[英]Parsing XML to CSV with Python

i'd like to parse an XML file into a CSV format and display in a format like:我想将 XML 文件解析为 CSV 格式并以如下格式显示:

在此处输入图像描述

I have succesfully located each element's text within the csv file, id like to match up the namelink and descriptions into rows and have the text from each element in each column as can be seen in the table.我已经成功地在 csv 文件中找到了每个元素的文本,我希望将名称链接和描述匹配到行中,并在每列中包含每个元素的文本,如表中所示。

The original XML File:原XML文件: 在此处输入图像描述

My current attempt:我目前的尝试:

# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd

# Parsing the XML file
xmlparse = Xet.parse('NiktoReportTest.xml')
root = xmlparse.getroot()

cols = ["namelink", "description"]
rows = []


x = []
for elm in root.findall("./niktoscan/scandetails/item/namelink"):
    x.append(elm.text)

y = []
for value in root.findall("./niktoscan/scandetails/item/description"):
    y.append(value.text)


rows.append({"namelink": x,
             "description": y})

df = pd.DataFrame(rows, columns=cols)

# Writing dataframe to csv
df.to_csv('output.csv')

The current output of the CSV file: CSV文件的当前output:

,namelink,description
0,"['http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/./', 'http://127.0.0.1:80/./', 'http://127.0.0.1:80//', 'http://127.0.0.1:80//', 'http://127.0.0.1:80/%2e/', 'http://127.0.0.1:80/%2e/', 'http://127.0.0.1:80///etc/hosts', 'http://127.0.0.1:80///', 'http://127.0.0.1:80/server-status', 'http://127.0.0.1:80/?PageServices', 'http://127.0.0.1:80/?wp-cs-dump', 'http://127.0.0.1:80///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////', 'http://127.0.0.1:80///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////', 'http://127.0.0.1:80/wp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wordpresswp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wordpresswp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wordpresswp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/assets/mobirise/css/meta.php?filesrc=', 'http://127.0.0.1:80/login.cgi?cli=aa%20aa%27cat%20/etc/hosts', 'http://127.0.0.1:80/shell?cat+/etc/hosts']","['The anti-clickjacking X-Frame-Options header is not present.', 'The X-XSS-Protection header is not defined. This header can hint to the user agent to protect against some forms of XSS', 'The X-Content-Type-Options header is not set. This could allow the user agent to render the content of the site in a different fashion to the MIME type', '/: Directory indexing found.', 'Allowed HTTP Methods: POST, OPTIONS, HEAD, GET ', '/./: Directory indexing found.', ""/./: Appending '/./' to a directory allows indexing"", '//: Directory indexing found.', '//: Apache on Red Hat Linux release 9 reveals the root directory listing by default if there is no index page.', '/%2e/: Directory indexing found.', '/%2e/: Weblogic allows source code or directory listing, upgrade to v6.0 SP1 or higher. BID-2513.', ""///etc/hosts: The server install allows reading of any system file by adding an extra '/' to the URL."", '///: Directory indexing found.', '/server-status: This reveals Apache information. Comment out appropriate line in the Apache conf file or restrict access to allowed sources.', ""/?PageServices: The remote server may allow directory listings through Web Publisher by forcing the server to show all files via 'open directory browsing'. Web Publisher should be disabled. CVE-1999-0269."", ""/?wp-cs-dump: The remote server may allow directory listings through Web Publisher by forcing the server to show all files via 'open directory browsing'. Web Publisher should be disabled. CVE-1999-0269."", '///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////: Directory indexing found.', ""///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////: Abyss 1.03 reveals directory listing when \t /'s are requested."", '/wp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wordpresswp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wordpresswp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wordpresswp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/assets/mobirise/css/meta.php?filesrc=: A PHP backdoor file manager was found.', '/login.cgi?cli=aa%20aa%27cat%20/etc/hosts: Some D-Link router remote command execution.', '/shell?cat+/etc/hosts: A backdoor was identified.']"

I modified your code to write to a CSV file.我修改了您的代码以写入 CSV 文件。 There is no need for panda if you use it only to write to a CSV file.如果您只使用 panda 来写入 CSV 文件,则不需要panda

import csv
import xml.etree.ElementTree as Xet

# Parsing the XML file
xmlparse = Xet.parse('test.xml')
root = xmlparse.getroot()

column_names = ["namelink", "description"]
column_values = {}

# Extract column data for all columns defined above
for column_name in column_names:
    column_values[column_name] = []
    for element in root.findall(f'./niktoscan/scandetails/item/{column_name}'):
        column_values[column_name].append(element.text)

# Create a row item for every column value, that was extracted above
rows = zip(*column_values.values())


with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(column_names)
    writer.writerows(rows)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM