简体   繁体   中英

How to Parse HTML in Robot Framework

Below is my text and it store in the ${Tooltipdata} :

    <hr><b><strong>Task Details</strong></b><hr><b>Date Created: </b> 02/21/2014 07:52pm<br> 
<b>Date Modified: </b> 02/24/2014 05:47pm<br><b>Assigned to: </b> Administrator<br>
<b>Created By: </b> Administrator<br><b>Status: </b> Pending Input<br><b>Description:
 </b> test<br>

I want to result as like this:

Task Details  Date Created:  02/21/2014 07:52pm    Date Modified:  02/24/2014 05:47pm    Assigned to:  Administrator   
 Created By:  Administrator   
 Status:  Pending Input   
 Description:  test.

Simple I want to remove HTML tag.

You can use the Evaluate keyword to run the python re.sub command. Something like this should work:

*** Keywords ***
| Remove HTML tags
| | [Documentation] | Strip HTML tags from the given string
| | [Arguments]     | ${string}
| | ${result}=      | Evaluate | re.sub(r'<.*?>', '', '''${string}''') | re
| | [Return]        | ${result}

*** Test cases ***
| Example
| | ${Tooltipdata}= | Some keyword which returns the tooltip data
| | ${string}= | Remove HTML tags | ${Tooltipdata}

If you're not familiar with regular expressions, the above expression means "match the shortest string that is between < and >', and the re.sub command will replace each occurrence with the empty string.

This will fail if your html tags include attributes that have > in them, and it will also replace non-html tags if your data includes both < and >, but that's the risk you take when you try to parse HTML with regular expressions. In your specific example, you should be safe.

The better alternative is to write a keyword in python, and use a real HTML parsing library such as Beautiful Soup to parse the data. For a code example, see this question .

you could try using a regex :

import re

data = "<hr><b><strong>Task Details</strong></b><hr><b>Date Created: </b> 02/21/2014 7:52pm<br><b>Date Modified: </b> 02/24/2014 05:47pm<br><b>Assigned to: </b> Administrator<br><b>Created By: </b> Administrator<br><b>Status: </b> Pending Input<br><b>Description: </b> test<br>"
# get text without tag
result = re.split(r'<[A-z\/]*>', data)

# print with removed tag
print ''.join(result)

By using String Library we can Replace the String. this is code which i use for Replace String.

${str} =    Replace String    ${Tooltipdata}    <hr>    a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM