简体   繁体   中英

AWS Lambda Python function remove trailing unicode

I'm trying to use a Python 3.6 AWS Lambda function to parse Windows logs sent from Cloudwatch.

These arrive in Lambda as JSON, so I extract the field I want using:

for i in data['logEvents']
   message = json.dumps((i['message']))

which gives me this in my message string:

"<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Lfsvc'/><EventID Qualifiers='0'>2</EventID><Level>4</Level><Task>0</Task><Keywords>0x80000000000000</Keywords><TimeCreated SystemTime='2018-04-03T15:33:57.186213000Z'/><EventRecordID>25371</EventRecordID><Channel>System</Channel><Computer>EC2AMAZ-1KJC0H1</Computer><Security/></System><EventData></EventData><RenderingInfo Culture='en-US'><Message>Geolocation positioning has been disabled by the user.</Message><Level>Information</Level><Task></Task><Opcode>Info</Opcode><Channel></Channel><Provider></Provider><Keywords><Keyword>Classic</Keyword></Keywords></RenderingInfo></Event>\u0000"

I am then trying to turn this string into XML to use with either xmltodict or xml.etree.ElementTree, but both of those return a malformed XML error because of the \ at the end.

So I then run this to remove the unicode:

xml = re.sub(u'(\u0000)', "", message)

which works fine on my computer in my local python console, and bothxmltodict & xml.etree.ElementTree can then work with the newly created xml string.

But when I run the re.sub command in the Lambda function, the \ remains at the end of the string.

Am I missing something obvious??

Adding the full output of print(i['message'])

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Service Control Manager' Guid='{555908d1-a6d7-4695-8e1e-26931d2012f4}' EventSourceName='Service Control Manager'/><EventID Qualifiers='16384'>7036</EventID><Version>0</Version><Level>4</Level><Task>0</Task><Opcode>0</Opcode><Keywords>0x8080000000000000</Keywords><TimeCreated SystemTime='2018-04-03T15:31:31.854941100Z'/><EventRecordID>25365</EventRecordID><Correlation/><Execution ProcessID='712' ThreadID='4768'/><Channel>System</Channel><Computer>EC2AMAZ-1KJC0H1</Computer><Security/></System><EventData><Data Name='param1'>Volume Shadow Copy</Data><Data Name='param2'>running</Data><Binary>5600530053002F0034000000</Binary></EventData><RenderingInfo Culture='en-US'><Message>The Volume Shadow Copy service entered the running state.</Message><Level>Information</Level><Task></Task><Opcode></Opcode><Channel></Channel><Provider>Microsoft-Windows-Service Control Manager</Provider><Keywords><Keyword>Classic</Keyword></Keywords></RenderingInfo></Event>\u0000

Many thanks, Dave

You're issue might be caused by the use of json.dumps on the message string before doing re.sub . See example below:

import re, json

msg = "</Keywords></RenderingInfo></Event>\u0000"
xml_1 = re.sub(u'(\u0000)', "", msg)
print(xml_1)

message = json.dumps(msg)
xml_2 = re.sub(u'(\u0000)', "", message)
print(xml_2)

Output

</Keywords></RenderingInfo></Event>
"</Keywords></RenderingInfo></Event>\u0000"

As you can see in the outputs it seems like that json.dumps introduces a double-quote to the message which causes a problem with your re.sub

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM