简体   繁体   English

我将如何解析呢?

[英]How would I parse this?

I have an email that looks like this: 我有一封看起来像这样的电子邮件:

We've received a request to change your email address to example@thisexample.com. 我们已收到将您的电子邮件地址更改为example@thisexample.com的请求。

To complete the process, please verify your email address by entering the following verification code. 要完成此过程,请通过输入以下验证码来验证您的电子邮件地址。

86761G 86761G

This code is temporary and will expire in 30 minutes. 此代码是临时代码,将在30分钟后过期。

If this wasn't requested by you, your account information will remain unchanged. 如果您不要求这样做,您的帐户信息将保持不变。 No further action is required. 不需要采取进一步行动。

Warm regards, Example.com 热烈的问候,Example.com

I need to parse out the verification code: 86761G . 我需要解析出验证码:86761G。 Catch being that the code is dynamic, meaning it's ever changing. 可以发现,代码是动态的,意味着它在不断变化。 What IS static though is the layout of the email, so my thought would be to grab it by the new line index [2] (Even though it looks there's spaces in between it's the third <p> tag in the Div therefor the [2] index via new lines). 电子邮件的布局虽然是静态的,所以我想将其换成新的行索引[2](即使看起来在Div的第三个<p>标记之间也有空格,也就是[2] ]通过新行进行索引)。 Or my other idea was to do it via the HTML somehow (Don't really wanna use HTMLAgilityPack). 或者我的另一个想法是以某种方式通过HTML进行操作(实际上并不想使用HTMLAgilityPack)。 The HTML is as follows for the Div: 该Div的HTML如下所示:

<td colspan="2" style="padding:1.2em 45px 2em 45px;color:#000;font-   family:Corbel, 'Trebuchet MS', 'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:.875em;line-height:1.1em;">
<p>We’ve received a request to change your email address to example@thisexample.com.</p>
<p>To complete the process, please verify your email address by entering the following verification code.</p>
<p>86761G</p>
<p>This code is temporary and will expire in 30 minutes.</p>
<p>If this wasn’t requested by you, your account information will remain unchanged. No further action is required.</p>


<p>Warm regards,<br>
example.com</p>
</td>

Any idea how to parse this data out? 知道如何解析此数据吗? I was thinking Regex if possible, even though I know that Regex isn't meant for HTML because it's not regular text. 即使我知道Regex不适用于HTML,因为它不是普通文本,我仍在考虑Regex。 If I need HTMLAgilityPack I'll use it, if not though I prefer not. 如果我需要HTMLAgilityPack,我会使用它,如果不需要,我会使用它。 Thank you guys! 感谢大伙们!

Oh side note - I'm using Firefox via Selenium, so there's always the option to use it's built in functions to grab it somehow? 哦,旁注-我正在通过Selenium使用Firefox,因此始终可以选择使用内置函数来以某种方式获取它?

Edit: I'm so stupid. 编辑:我很愚蠢。 Selenium - FindElementbyXPath (facepalm) 硒-FindElementbyXPath(facepalm)

Contrary to popular (and misinformed, imo) opinion, you can use Regular Expressions to extract this because the overarching structure of this document does, in fact, meet the requirements to be considered a Regular Grammar ( http://en.wikipedia.org/wiki/Chomsky_hierarchy ) 与流行的(和误导的imo)观点相反,您可以使用正则表达式来提取此信息,因为该文档的总体结构实际上满足了被视为正则语法的要求( http://en.wikipedia.org / wiki / Chomsky_hierarchy

Here's a regex I would use: 这是我要使用的正则表达式:

following verification code.</p>\s*<p>(\S+)</p>

Note the lack of any anchors ( ^$ ), it uses the known text "following verification code" to match just before the code. 请注意,缺少锚点( ^$ ),它使用已知的文本“跟随验证代码”在代码之前进行匹配。 The verification code is then contained within the single regex group. 验证代码然后包含在单个正则表达式组中。

如果您使用硒,则最可能的最简单方法是将其与以下CSS选择器匹配:p:nth-​​child(3)

Since you've mentioned only the verification code part is dynamic, I'm assuming whole markup structure won't change. 由于您只提到了验证代码部分是动态的,因此我假设整个标记结构不会改变。

If this is true, you could use 如果是这样,您可以使用

<p>(.*?)<\/p>

This will capture <p> tags, 3rd captured group is your verification code. 这将捕获<p>标签,捕获的第三个组是您的验证码。

You can use the following regular expression if the email is exactly the same all the time accept changing code: 如果电子邮件始终完全相同,则可以使用以下正则表达式来接受更改代码:

(?<d>\<p\>[\S^\.]*</p\>)

if it is more complex you can do this: 如果更复杂,则可以执行以下操作:

(?<d>\<p\>.*</p\>)

which will find all paragraph lines and you can then iterate and find the code by elimination of constant strings like: 它将找到所有段落行,然后您可以通过消除常量字符串来迭代并查找代码,例如:

To complete the process, please verify your email address by entering the following verification code. 要完成此过程,请通过输入以下验证码来验证您的电子邮件地址。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM