如何使用PowerShell / RegEx查找空白的所有HTML文件 <title>标签？

Question

I am using Powershell to search thousands of HTML files for files that contain empty <title> tags. 我正在使用Powershell在数千个HTML文件中搜索包含空<title>标记的文件。 These tags may appear in the files with no space, whitespace, or line breaks in between the opening/closing tags. 这些标签可能会出现在文件中，在开始/结束标签之间没有空格，空白或换行符。 For example, they may look like any of the following 例如，它们可能看起来像以下任何一种

<title></title>

<title>  </title>

<title>
</title>

So far I have the following code 到目前为止，我有以下代码

Get-ChildItem locationPath *.htm -Recurse |
    Select-String -pattern '<title>[\s]*</title>' |
    group path |
    select name

This works to provide me a list of all the files that match the first two examples. 这可以为我提供与前两个示例匹配的所有文件的列表。 However, I am struggling to find a way to match the third example in which it has a line break and an unknown amount of whitespace. 但是，我正在努力寻找一种方法来匹配第三个示例，其中第三个示例具有换行符和未知数量的空格。 Any help would be greatly appreciated. 任何帮助将不胜感激。

Answer 1

Select-String processes the input line by line, so it won't catch your 3rd example. Select-String逐行处理输入，因此不会捕获您的第三个示例。 Try this to get the input as a single string: 尝试执行以下操作以将输入作为单个字符串获取：

Get-ChildItem -Filter '*.htm' -Recurse | Where-Object {
    (Get-Content $_.FullName -Raw) -match '<title>\s*</title>'
} | Select-Object -Expand FullName

Prior to PowerShell v3 you'll need to replace Get-Content -Raw with Get-Content | Out-String 在PowerShell v3之前，您需要将Get-Content -Raw替换为Get-Content | Out-String Get-Content | Out-String , because the parameter -Raw was introduced with v3. Get-Content | Out-String ，因为参数-Raw是v3引入的。

如何使用PowerShell / RegEx查找空白的所有HTML文件 <title>标签？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-12-04 20:00:35

如何使用PowerShell / RegEx查找空白的所有HTML文件 <title>标签？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-12-04 20:00:35

解决方案1
1 已采纳 2018-12-04 20:00:35