[英]AWS DynamoDB data to json format in Python/Boto3/Lamba
[英]Python Boto3 - data isn't being written to DynamoDB properly
我有一个XML文件,正在使用Python解析字符串并写入AWS中的DynamoDB表。 标签是<IMAGE_ID>
和<CVSS_FINAL>
。 当我遍历和print()
这些值时,它将返回所有这些值。 但是,当我写入Dynamo时,仅写入一行数据。 因此,我不明白为什么print()
返回所有内容,但是只有一行写入到数据存储中。
码:
import boto3
import lxml
from lxml import etree
def WriteItemToTable():
s3 = boto3.resource('s3')
bucket = ‘xxxxxxxxxxxx’
key = 'vuln_data.xml'
dynamo = boto3.client('dynamodb')
obj = s3.Object(‘xxxxxxxxxx’, 'vuln_data.xml')
body = obj.get()['Body'].read()
image_id = etree.fromstring(body).findall('HOST_LIST/HOST/EC2_INFO/IMAGE_ID')
risk_score = etree.fromstring(body).findall('HOST_LIST/HOST/VULN_INFO_LIST/VULN_INFO/CVSS_FINAL')
for el in image_id:
i = el.text
print(i)
for el in risk_score:
j = el.text
print(j)
response = dynamo.put_item(
TableName='ExistingAMI',
Item={
'AMI_ID': {
'S': i
},
'CVSS_SCORE': {
'S': j
},
}
)
WriteItemToTable()
XML:
<HOST_LIST>
<HOST>
<EC2_INFO>
<PUBLIC_DNS_NAME><![CDATA[ec2-xxxxxxxxxxx.compute-1.amazonaws.com]]></PUBLIC_DNS_NAME>
<IMAGE_ID><![CDATA[ami-xxxxxx]]></IMAGE_ID>
</EC2_INFO>
<OPERATING_SYSTEM><![CDATA[Linux x.y]]></OPERATING_SYSTEM>
<VULN_INFO_LIST>
<VULN_INFO>
<QID id="qid_x”>x</QID>
<TYPE>Vuln</TYPE>
<CVSS_FINAL>3.5</CVSS_FINAL>
<RESULT><![CDATA[TLSv1.0 is supported]]></RESULT>
</VULN_INFO>
<VULN_INFO>
<QID id="qid_xxxx">xxxxx</QID>
<CVSS_FINAL>2.1</CVSS_FINAL>
</VULN_INFO>
<VULN_INFO>
<QID id="qid_xxxx">xxxx</QID>
<CVSS_FINAL>4.3</CVSS_FINAL>
<RESULT><![CDATA[TLSv1.0 is supported]]></RESULT>
</VULN_INFO>
</VULN_INFO_LIST>
</HOST>
<HOST>
<EC2_INFO>
<PUBLIC_DNS_NAME><![CDATA[ec2-xxxxxxxxx.compute-1.amazonaws.com]]></PUBLIC_DNS_NAME>
<IMAGE_ID><![CDATA[ami-yyyyyy]]></IMAGE_ID>
</EC2_INFO>
<OPERATING_SYSTEM><![CDATA[Amazon Linux]]></OPERATING_SYSTEM>
<VULN_INFO_LIST>
<VULN_INFO>
<QID id=“x”>x</QID>
<CVSS_FINAL>3.6</CVSS_FINAL>
</VULN_INFO>
</VULN_INFO_LIST>
</HOST>
</HOST_LIST>
print()输出:
ami-xxxxxx
ami-yyyyyy
3.5
3.6
发电机表:
虽然我对DynamoDB一无所知,但是您的Python代码应该只传递一个i和j值,因为您的dynamo.put_item
代码块未嵌套在两个for
循环中,因此采用了它们的最后分配值。
只需将您的image_id和risk_score搜索一起运行在一个可以嵌套在<HOST>
级别的循环中即可。 而考虑xpath()
可用lxml
。 而且,当您导入其方法etree
不需要import lxml
调用。
doc = etree.fromstring(body) # PARSE ONLY ONCE
hosts = doc.xpath('//HOST')
for h in hosts:
i = h.xpath('EC2_INFO/IMAGE_ID')[0].text
print(i)
j = h.xpath('VULN_INFO_LIST/VULN_INFO/CVSS_FINAL')[0].text
print(j)
response = dynamo.put_item(
TableName='ExistingAMI',
Item={
'AMI_ID': {
'S': i
},
'CVSS_SCORE': {
'S': j
},
}
)
WriteItemToTable()
# ami-xxxxxx
# 3.5
# ami-yyyyyy
# 3.6
对于多个CVSS_FINAL ,使用XPath解析为<CSVSS_FINAL>
然后使用ancestor::*
::: ancestor::*
检索对应的IMAGE_ID ancestor::*
cvss = obj.xpath('//CVSS_FINAL') # ALL CVSS_FINAL NODES
for c in cvss:
i = c.xpath('ancestor::HOST/EC2_INFO/IMAGE_ID')[0].text
print(i)
j = c.text
print(j)
k = c.xpath('concat(following-sibling::RESULT, "")')
print(k)
response = dynamo.put_item( ... )
# ami-xxxxxx
# 3.5
# TLSv1.0 is supported
# ami-xxxxxx
# 2.1
#
# ami-xxxxxx
# 4.3
# TLSv1.0 is supported
# ami-yyyyyy
# 3.6
#
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.