简体   繁体   English

这个shell / sed脚本怎么了?

[英]What's wrong with this shell/sed script?

I have about 150 HTML files in a given directory that I'd like to make some changes to. 我要对给定目录中的大约150个HTML文件进行一些更改。 Some of the anchor tags have an href along the following lines: index.php?page=something . 一些锚标记具有沿以下几行的href: index.php?page=something I'd like all of those to be changed to something.html . 我希望将所有这些都更改为something.html Simple regex, simple script. 简单的正则表达式,简单的脚本。 I can't seem to get it correct, though. 不过,我似乎无法正确理解。 Can somebody weigh in on what I'm doing wrong? 有人可以对我做错了吗?

Sample html, before and after output: 输出之前和之后的示例html:

<!-- Before -->
<ul>
    <li><a href="#">Apple</a></li>
    <li><a href="index.php?page=dandelion">Dandelion</a></li>
    <li><a href="index.php?page=elephant">Elephant</a></li>
    <li><a href="index.php?page=resonate">Resonate</a></li>
</ul>

<!-- After -->
<ul>
    <li><a href="#">Apple</a></li>
    <li><a href="dandelion.html">Dandelion</a></li>
    <li><a href="elephant.html">Elephant</a></li>
    <li><a href="resonate.html">Resonate</a></li>
</ul>

Script file: 脚本文件:

#! /bin/bash

for f in *.html
do
    sed s/\"index\.php?page=\([.]*\)\"/\1\.html/g < $f >! $f
done

It's your regex, and the fact that the shell is trying to interpret bits of your regex. 这是您的正则表达式,并且外壳程序正在尝试解释您的正则表达式的事实。

First - the [.]* matches any number of literal dots . 首先- [.]*匹配任意数量的文字点. . Change it to .* . 将其更改为.*

Secondly, enclose the entire regex in single quotes ' to prevent the bash shell from interpreting any of it. 其次,包围整个正则表达式中的单引号' ,以防止在bash shell解释它的任何。

sed 's/"index\.php?page=\(.*\)"/\1\.html/g'

Also, instead of < $f >! $f 另外,代替< $f >! $f < $f >! $f you can just feed in the '-i' switch to sed to have it operate in-place: < $f >! $f您只需将'-i'开关输入sed即可使其就地运行:

sed -i 's/"index\.php?page=\(.*\)"/"\1\.html"/g' "$f"

(Also, as another point I think in your replacement you want double quotes around the \\1.html so that the new URL is quoted within the HTML. I also quoted your $f to "$f" , because if the file name contains spaces bash will complain). (另外,我想在替换中,您需要在\\1.html周围用双引号引起来,以便在HTML \\1.html新的引号引起来。我还将$f引用为"$f" ,因为如果文件名包含空格bash会抱怨)。

EDIT : as @TimPote notes, the standard way to match something within quotes is either ".*?" 编辑 :正如@TimPote所指出的,在引号内匹配内容的标准方法是".*?" (so that the .* is non-greedy) or "[^"]+" . Sed doesn't support the former, so try: (因此.*是非贪婪的)或"[^"]+" 。Sed不支持前者,因此请尝试:

sed -i 's/"index\.php?page=\([^"]\+\)"/"\1\.html"/g' "$f"

This is to prevent (for example) <a href="index.php?page=asdf">"asdf"</a> from being turned into <a href="asdf">"asdf.html"</a> (where the (.*) captured asdf">"asdf , being greedy). 这是为了防止(例如) <a href="index.php?page=asdf">"asdf"</a>变成<a href="asdf">"asdf.html"</a> (其中(.*)捕获asdf">"asdf表示贪婪)。

Your .* was too greedy. 您的.*太贪婪。 Use [^"]\\+ instead. Plus your quotes were all messed up. Surround the whole thing with single quotes instead, then you can use " without escaping them. 使用[^"]\\+代替。加上您的引号都被弄乱了。整个内容都用单引号引起来,然后您可以使用"而不必转义。

sed -i 's/"index\.php?page=\([^"]\+\)"/"\1\.html"/g'

You can do this whole operation with a single statement using find : 您可以使用find使用单个语句完成整个操作:

find . -maxdepth 1 -type f -name '*.html' \
 -exec sed -i 's/"index\.php?page=\([^"]\+\)"/"\1\.html"/g' {} \+

The following works: 以下作品:

 sed "s/\"index\.php?page=\(.*\)\"/\"\1.html\"/g" < 1.html 

I think it was mostly the square brackets. 我认为主要是方括号。 Not sure why you had them. 不知道为什么要拥有它们。 Oh, and the entire sed command needs to be in quotes. 哦,整个sed命令需要用引号引起来。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM