简体   繁体   中英

Parsing Word formatted text from XML document in Azure Logic Apps

I'm trying to parse XML files from a SharePoint form library where the user has copy/pasted formatted Word document text into a text field. The result is XML inside of XML. I had trouble just getting the contents but with help in another question this syntax worked xpath(xml(outputs('Get_file_content')?['body']),'//*[local-name()="myFields"]//following-sibling::*[local-name()="Request_Description"]')[0] . The result is something like this

<my:Request_Description xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2017-05-05T14:19:13">
  <xhtml:html xml:space="preserve" xmlns="http://www.w3.org/1999/xhtml" xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <xhtml:div>
      <xhtml:font size="1" face="CIDFont+F6">
        <xhtml:font size="1" face="CIDFont+F6">
          <xhtml:p>This is where the request description goes and the result we want</xhtml:p>
</xhtml:font>
      </xhtml:font>
    </xhtml:div>
</xhtml:html>
</my:Request_Description>

How do I go about just extracting the text for the description? I'm wondering if my first xpath statement needs to be adjusted so as to not pull back the entire element.

UPDATE - I failed to mention that the above was just one example of the user input to that field and each form will be different. For example, here is another example of what can be found in that field.

<my:Request_Description xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2017-05-05T14:19:13">
  <xhtml:html xml:space="preserve" xmlns="http://www.w3.org/1999/xhtml" xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <xhtml:div>test random double quote inside title "here" test and carriage<xhtml:br />return</xhtml:div>
</xhtml:html>
</my:Request_Description>

This is caused by a RTF control on the form where the user can enter into a textbox on the form and the control converts that to the XML you see. Since there is no consistency I'm wondering if using xpath is not a viable option but I'm not sure what else could be done.

You can use this expression:

xpath(xml(outputs('Get_file_content')?['body']), 'string(/*[local-name()="Request_Description"]/*[local-name()="html"]/*[local-name()="div"]/*[local-name()="font"]/*[local-name()="font"]/*[local-name()="p"])')

You can refer to this official document to refer to the specific usage of xpath .

========================update===========================

You can use trim , then use this expression:

trim(xpath(xml(outputs('Get_file_content')?['body']), 'string(/*[local-name()="Request_Description"])'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM