简体   繁体   English

如何使用PowerShell / RegEx查找空白的所有HTML文件 <title>标签?

[英]How to use PowerShell/RegEx to find all HTML files with empty <title> tag?

I am using Powershell to search thousands of HTML files for files that contain empty <title> tags. 我正在使用Powershell在数千个HTML文件中搜索包含空<title>标记的文件。 These tags may appear in the files with no space, whitespace, or line breaks in between the opening/closing tags. 这些标签可能会出现在文件中,在开始/结束标签之间没有空格,空白或换行符。 For example, they may look like any of the following 例如,它们可能看起来像以下任何一种

<title></title>
<title>  </title>
<title>
</title>

So far I have the following code 到目前为止,我有以下代码

Get-ChildItem locationPath *.htm -Recurse |
    Select-String -pattern '<title>[\s]*</title>' |
    group path |
    select name

This works to provide me a list of all the files that match the first two examples. 这可以为我提供与前两个示例匹配的所有文件的列表。 However, I am struggling to find a way to match the third example in which it has a line break and an unknown amount of whitespace. 但是,我正在努力寻找一种方法来匹配第三个示例,其中第三个示例具有换行符和未知数量的空格。 Any help would be greatly appreciated. 任何帮助将不胜感激。

Select-String processes the input line by line, so it won't catch your 3rd example. Select-String逐行处理输入,因此不会捕获您的第三个示例。 Try this to get the input as a single string: 尝试执行以下操作以将输入作为单个字符串获取:

Get-ChildItem -Filter '*.htm' -Recurse | Where-Object {
    (Get-Content $_.FullName -Raw) -match '<title>\s*</title>'
} | Select-Object -Expand FullName

Prior to PowerShell v3 you'll need to replace Get-Content -Raw with Get-Content | Out-String 在PowerShell v3之前,您需要将Get-Content -Raw替换为Get-Content | Out-String Get-Content | Out-String , because the parameter -Raw was introduced with v3. Get-Content | Out-String ,因为参数-Raw是v3引入的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM