[英]How to use PowerShell/RegEx to find all HTML files with empty <title> tag?
I am using Powershell to search thousands of HTML files for files that contain empty <title>
tags. 我正在使用Powershell在数千个HTML文件中搜索包含空
<title>
标记的文件。 These tags may appear in the files with no space, whitespace, or line breaks in between the opening/closing tags. 这些标签可能会出现在文件中,在开始/结束标签之间没有空格,空白或换行符。 For example, they may look like any of the following
例如,它们可能看起来像以下任何一种
<title></title>
<title> </title>
<title>
</title>
So far I have the following code 到目前为止,我有以下代码
Get-ChildItem locationPath *.htm -Recurse |
Select-String -pattern '<title>[\s]*</title>' |
group path |
select name
This works to provide me a list of all the files that match the first two examples. 这可以为我提供与前两个示例匹配的所有文件的列表。 However, I am struggling to find a way to match the third example in which it has a line break and an unknown amount of whitespace.
但是,我正在努力寻找一种方法来匹配第三个示例,其中第三个示例具有换行符和未知数量的空格。 Any help would be greatly appreciated.
任何帮助将不胜感激。
Select-String
processes the input line by line, so it won't catch your 3rd example. Select-String
逐行处理输入,因此不会捕获您的第三个示例。 Try this to get the input as a single string: 尝试执行以下操作以将输入作为单个字符串获取:
Get-ChildItem -Filter '*.htm' -Recurse | Where-Object {
(Get-Content $_.FullName -Raw) -match '<title>\s*</title>'
} | Select-Object -Expand FullName
Prior to PowerShell v3 you'll need to replace Get-Content -Raw
with Get-Content | Out-String
在PowerShell v3之前,您需要将
Get-Content -Raw
替换为Get-Content | Out-String
Get-Content | Out-String
, because the parameter -Raw
was introduced with v3. Get-Content | Out-String
,因为参数-Raw
是v3引入的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.