简体   繁体   English

Python Boto3-数据未正确写入DynamoDB

[英]Python Boto3 - data isn't being written to DynamoDB properly

I have an XML file I am parsing strings from with Python and writing to a DynamoDB table in AWS. 我有一个XML文件,正在使用Python解析字符串并写入AWS中的DynamoDB表。 The tags are <IMAGE_ID> and <CVSS_FINAL> . 标签是<IMAGE_ID><CVSS_FINAL> When I loop through and print() these values, it returns all of them. 当我遍历和print()这些值时,它将返回所有这些值。 However, when I write to Dynamo, only one row of data is written. 但是,当我写入Dynamo时,仅写入一行数据。 So, I don't understand why print() returns everything, but only one row is written to the datastore. 因此,我不明白为什么print()返回所有内容,但是只有一行写入到数据存储中。

Code: 码:

import boto3
import lxml
from lxml import etree

def WriteItemToTable():
    s3 = boto3.resource('s3')
    bucket = ‘xxxxxxxxxxxx’
    key = 'vuln_data.xml'
    dynamo = boto3.client('dynamodb')

    obj = s3.Object(‘xxxxxxxxxx’, 'vuln_data.xml')
    body = obj.get()['Body'].read()

    image_id = etree.fromstring(body).findall('HOST_LIST/HOST/EC2_INFO/IMAGE_ID')
    risk_score = etree.fromstring(body).findall('HOST_LIST/HOST/VULN_INFO_LIST/VULN_INFO/CVSS_FINAL')

    for el in image_id:
        i = el.text
        print(i)

    for el in risk_score:
        j = el.text
        print(j)

    response = dynamo.put_item(
    TableName='ExistingAMI',
        Item={
            'AMI_ID': {
                'S': i
             },
            'CVSS_SCORE': {
                'S': j
            },
         }
       )

WriteItemToTable()

XML: XML:

<HOST_LIST>
    <HOST>
      <EC2_INFO>
        <PUBLIC_DNS_NAME><![CDATA[ec2-xxxxxxxxxxx.compute-1.amazonaws.com]]></PUBLIC_DNS_NAME>
        <IMAGE_ID><![CDATA[ami-xxxxxx]]></IMAGE_ID>
      </EC2_INFO>
      <OPERATING_SYSTEM><![CDATA[Linux x.y]]></OPERATING_SYSTEM>
      <VULN_INFO_LIST>
        <VULN_INFO>
          <QID id="qid_x”>x</QID>
          <TYPE>Vuln</TYPE>
          <CVSS_FINAL>3.5</CVSS_FINAL>
          <RESULT><![CDATA[TLSv1.0 is supported]]></RESULT>
        </VULN_INFO>
        <VULN_INFO>
      <QID id="qid_xxxx">xxxxx</QID>
      <CVSS_FINAL>2.1</CVSS_FINAL>
    </VULN_INFO>
    <VULN_INFO>
      <QID id="qid_xxxx">xxxx</QID>
      <CVSS_FINAL>4.3</CVSS_FINAL>
      <RESULT><![CDATA[TLSv1.0 is supported]]></RESULT>
    </VULN_INFO>
    </VULN_INFO_LIST>
    </HOST>
    <HOST>
      <EC2_INFO>
        <PUBLIC_DNS_NAME><![CDATA[ec2-xxxxxxxxx.compute-1.amazonaws.com]]></PUBLIC_DNS_NAME>
        <IMAGE_ID><![CDATA[ami-yyyyyy]]></IMAGE_ID>
      </EC2_INFO>
      <OPERATING_SYSTEM><![CDATA[Amazon Linux]]></OPERATING_SYSTEM>
      <VULN_INFO_LIST>
        <VULN_INFO>
          <QID id=“x”>x</QID>
          <CVSS_FINAL>3.6</CVSS_FINAL>
        </VULN_INFO>
    </VULN_INFO_LIST>
    </HOST>
</HOST_LIST>

print() output: print()输出:

ami-xxxxxx
ami-yyyyyy
3.5
3.6

Dynamo Table: 发电机表:

Dynamo Table 发电机表

While I know nothing of DynamoDB, your Python code should only pass one i and j value as your dynamo.put_item block of code is not nested in either of the for loops and hence takes their last assigned values. 虽然我对DynamoDB一无所知,但是您的Python代码应该只传递一个ij值,因为您的dynamo.put_item代码块未嵌套在两个for循环中,因此采用了它们的最后分配值。

Simply run your image_id and risk_score search together in one loop which can be nested at the <HOST> level. 只需将您的image_idrisk_score搜索一起运行在一个可以嵌套在<HOST>级别的循环中即可。 And consider xpath() , available in lxml . 而考虑xpath()可用lxml And no need for the import lxml call as you import its method etree instead. 而且,当您导入其方法etree不需要import lxml调用。

doc = etree.fromstring(body)  # PARSE ONLY ONCE

hosts = doc.xpath('//HOST')

for h in hosts:
    i = h.xpath('EC2_INFO/IMAGE_ID')[0].text
    print(i)

    j = h.xpath('VULN_INFO_LIST/VULN_INFO/CVSS_FINAL')[0].text
    print(j)

    response = dynamo.put_item(
        TableName='ExistingAMI',
        Item={
            'AMI_ID': {
                 'S': i
             },
            'CVSS_SCORE': {
                'S': j
            },
         }
    )

WriteItemToTable()

# ami-xxxxxx
# 3.5
# ami-yyyyyy
# 3.6

For multiple CVSS_FINAL , use XPath to parse down to the <CSVSS_FINAL> then retrieve corresponding IMAGE_ID with ancestor::* 对于多个CVSS_FINAL ,使用XPath解析为<CSVSS_FINAL>然后使用ancestor::* ::: ancestor::*检索对应的IMAGE_ID ancestor::*

cvss = obj.xpath('//CVSS_FINAL')   # ALL CVSS_FINAL NODES

for c in cvss:        
    i = c.xpath('ancestor::HOST/EC2_INFO/IMAGE_ID')[0].text
    print(i)

    j = c.text
    print(j)

    k = c.xpath('concat(following-sibling::RESULT, "")')
    print(k)

    response = dynamo.put_item( ... )

# ami-xxxxxx
# 3.5
# TLSv1.0 is supported
# ami-xxxxxx
# 2.1
# 
# ami-xxxxxx
# 4.3
# TLSv1.0 is supported
# ami-yyyyyy
# 3.6
#

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM