简体   繁体   English

从网页获取特定标签

[英]Get Specific Tags from WebPage

I have been trying for hours to use PowerShell (and numerous StackOverflow threads/Google articles) - but failed to make this work. 我一直在尝试使用PowerShell数小时(以及许多StackOverflow线程/ Google文章),但未能使其正常工作。

I have this PowerShell Script: 我有以下PowerShell脚本:

$URI = "http://example.html"
$HTML = Invoke-WebRequest -Uri $URI

Which returns the HTML below. 返回下面的HTML。 I am just trying to get the "In" and "Out" values which appears under "Average max 5 min values for Daily' Graph (5 Minute interval): - and not the weekly' values, which have almost identical tags. 我只是想获取出现在“ Daily' Graph (5 Minute interval): - and not the “平均最大5分钟”值Daily' Graph (5 Minute interval): - and not the下的“输入”和“输出”值Daily' Graph (5 Minute interval): - and not the每周”值,它们具有几乎相同的标签。

<!-- End Head -->
<!-- Begin `Daily' Graph (5 Minute --><div class="graph">
        <h2>`Daily' Graph (5 Minute Average)</h2>
        <img src="aklsr2_gi0_1-day.png" title="day" alt="day" />
        <table>
            <tr>
                <th></th>
                <th scope="col">Max</th>
                <th scope="col">Average</th>
                <th scope="col">Current</th>
            </tr>
            <tr class="in">
                <th scope="row">In</th>
                <td>9939.4 kb/s (99.4%)</td>
                <td>1908.7 kb/s (19.1%) </td>
                <td>80.8 kb/s (0.8%) </td>
            </tr>
            <tr class="out">
                <th scope="row">Out</th>
                <td>9682.3 kb/s (96.8%) </td>
                <td>344.1 kb/s (3.4%) </td>
                <td>83.8 kb/s (0.8%) </td>
            </tr>
            <tr>
                <td colspan="8">
                    Average max 5 min values for `Daily' Graph (5 Minute interval):
                    <span class="in">In</span> 2264.1 kb/s (22.6%)/
                    <span class="out">Out</span> 451.0 kb/s (4.5%)
                </td>
            </tr>
        </table>
    </div>
<!-- End `Daily' Graph (5 Minute -->

<!-- Begin `Weekly' Graph (30 Minute -->
    <div class="graph">
        <h2>`Weekly' Graph (30 Minute Average)</h2>
        <img src="aklsr2_gi0_1-week.png" title="week" alt="week" />
        <table>
            <tr>
                <th></th>
                <th scope="col">Max</th>
                <th scope="col">Average</th>
                <th scope="col">Current</th>
            </tr>
            <tr class="in">
                <th scope="row">In</th>
                <td>9939.4 kb/s (99.4%)</td>
                <td>1273.3 kb/s (12.7%) </td>
                <td>98.8 kb/s (1.0%) </td>
            </tr>
            <tr class="out">
                <th scope="row">Out</th>
                <td>9775.1 kb/s (97.8%) </td>
                <td>249.9 kb/s (2.5%) </td>
                <td>61.6 kb/s (0.6%) </td>
            </tr>
            <tr>
                <td colspan="8">
                    Average max 5 min values for `Weekly' Graph (30 Minute interval):
                    <span class="in">In</span> 2236.6 kb/s (22.4%)/
                    <span class="out">Out</span> 593.8 kb/s (5.9%)
                </td>
            </tr>
        </table>
    </div>
<!-- End `Weekly' Graph (30 Minute -->

You can use the -match comparison operator. 您可以使用-match比较运算符。 -match will match only the first occurrence, and since the values are the first ones in the HTML output, it will give you the values you need. -match将仅匹配第一个匹配项,并且由于值是HTML输出中的第一个匹配项,因此它将为您提供所需的值。 You can do something like this: 您可以执行以下操作:

$regexes = @('<span class="in">In<\/span> (.*)/', '<span class="out">Out</span> (.*)')

$regexes | ForEach-Object {
    $htmlText -match $_ 
    $Matches[1]
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM