简体   繁体   English

Powershell,从html文件中删除文本行

[英]Powershell, delete lines of text from html file

I have some reports in html file. 我在html文件中有一些报告。 I need to place them to excel and make some changes, so I thought I could do those changes beforehand using powershell. 我需要将它们放置在卓越的位置并进行一些更改,所以我认为我可以事先使用powershell进行这些更改。 Some of the lines are in fixed places, others are not so I need to delete them by making the script recognize a pattern. 有些行位于固定位置,而其他行则不在,因此我需要通过使脚本识别模式来删除它们。

Fixed lines starting from top: 12-14,17,19,25-27,30-32,40-42 Fixed lines starting from bottom: 3-13, 48-60 从顶部开始的固定线:12-14,17,19,25-27,30-32,40-42从底部开始的固定线:3-13,48-60

The pattern I need to find and delete, is this: 我需要查找和删除的模式是这样的:

<td align="center">random string</td>
<td align="left">random string</td>
<td align="left">random string</td>
<td align="left">random string</td>
<td align="right">random string</td>

For the fixed lines I found I can do this: 对于固定线路,我可以执行以下操作:

(gc $maindir\Report23.HTML) | ? {(12..14) -notcontains $_.ReadCount} | out-file $maindir\Report23b.HTML

It works as it deletes the lines 12-14 but I need to put the rest of the fixed line numbers in the same command and I can't seem to figure out how. 它在删除第12-14行时起作用,但是我需要将其余的固定行号放在同一命令中,我似乎无法弄清楚该怎么做。 Also the output file's filesize is twice the original's, which I find weird. 另外,输出文件的文件大小是原始文件大小的两倍,我觉得很奇怪。 I tried using set-content which produces a filesize close to the original but breaks the text encoding in certain parts. 我尝试使用set-content生成接近原始文件的文件大小,但在某些部分中断了文本编码。

I have no idea how to go about for recognizing the pattern though... 我不知道如何去识别模式...

Can't you do something like: 你不能做这样的事情:

$lines = 12..14
$lines += 17
$lines += 25..27
$lines += 30..32
$lines += 40..42

and then use that array in your where clause: 然后在where子句中使用该数组:

? {$lines -notcontains $_.ReadCount} 

The output file's filesize is twice the original because the original file was probably ASCII-encoded, the new file is per default Unicode-encoded. 输出文件的文件大小是原始文件的两倍,因为原始文件可能是ASCII编码的,新文件默认是Unicode编码的。 Try this: 尝试这个:

$length = (gc $maindir\Report23.HTML).length
$rangefrombottom = ($length-60)..($length-48)+($length-13)..($length-3)
$rangefromtop = 12..14+17,19+25..27+30..32+40..42
(gc $maindir\Report23.HTML) | ? {$rangefromtop -notcontains $_.ReadCount} | ? {$rangefrombottom -notcontains $_.ReadCount} | out-file -encoding ASCII $maindir\Report23b.HTML

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM