简体   繁体   中英

How would I parse this?

I have an email that looks like this:

We've received a request to change your email address to example@thisexample.com.

To complete the process, please verify your email address by entering the following verification code.

86761G

This code is temporary and will expire in 30 minutes.

If this wasn't requested by you, your account information will remain unchanged. No further action is required.

Warm regards, Example.com

I need to parse out the verification code: 86761G . Catch being that the code is dynamic, meaning it's ever changing. What IS static though is the layout of the email, so my thought would be to grab it by the new line index [2] (Even though it looks there's spaces in between it's the third <p> tag in the Div therefor the [2] index via new lines). Or my other idea was to do it via the HTML somehow (Don't really wanna use HTMLAgilityPack). The HTML is as follows for the Div:

<td colspan="2" style="padding:1.2em 45px 2em 45px;color:#000;font-   family:Corbel, 'Trebuchet MS', 'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:.875em;line-height:1.1em;">
<p>We’ve received a request to change your email address to example@thisexample.com.</p>
<p>To complete the process, please verify your email address by entering the following verification code.</p>
<p>86761G</p>
<p>This code is temporary and will expire in 30 minutes.</p>
<p>If this wasn’t requested by you, your account information will remain unchanged. No further action is required.</p>


<p>Warm regards,<br>
example.com</p>
</td>

Any idea how to parse this data out? I was thinking Regex if possible, even though I know that Regex isn't meant for HTML because it's not regular text. If I need HTMLAgilityPack I'll use it, if not though I prefer not. Thank you guys!

Oh side note - I'm using Firefox via Selenium, so there's always the option to use it's built in functions to grab it somehow?

Edit: I'm so stupid. Selenium - FindElementbyXPath (facepalm)

Contrary to popular (and misinformed, imo) opinion, you can use Regular Expressions to extract this because the overarching structure of this document does, in fact, meet the requirements to be considered a Regular Grammar ( http://en.wikipedia.org/wiki/Chomsky_hierarchy )

Here's a regex I would use:

following verification code.</p>\s*<p>(\S+)</p>

Note the lack of any anchors ( ^$ ), it uses the known text "following verification code" to match just before the code. The verification code is then contained within the single regex group.

如果您使用硒,则最可能的最简单方法是将其与以下CSS选择器匹配:p:nth-​​child(3)

Since you've mentioned only the verification code part is dynamic, I'm assuming whole markup structure won't change.

If this is true, you could use

<p>(.*?)<\/p>

This will capture <p> tags, 3rd captured group is your verification code.

You can use the following regular expression if the email is exactly the same all the time accept changing code:

(?<d>\<p\>[\S^\.]*</p\>)

if it is more complex you can do this:

(?<d>\<p\>.*</p\>)

which will find all paragraph lines and you can then iterate and find the code by elimination of constant strings like:

To complete the process, please verify your email address by entering the following verification code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM