这个shell / sed脚本怎么了？

Question

I have about 150 HTML files in a given directory that I'd like to make some changes to. 我要对给定目录中的大约150个HTML文件进行一些更改。 Some of the anchor tags have an href along the following lines: index.php?page=something . 一些锚标记具有沿以下几行的href： index.php?page=something 。 I'd like all of those to be changed to something.html . 我希望将所有这些都更改为something.html 。 Simple regex, simple script. 简单的正则表达式，简单的脚本。 I can't seem to get it correct, though. 不过，我似乎无法正确理解。 Can somebody weigh in on what I'm doing wrong? 有人可以对我做错了吗？

Sample html, before and after output: 输出之前和之后的示例html：

<!-- Before -->
<ul>
    <li><a href="#">Apple</a></li>
    <li><a href="index.php?page=dandelion">Dandelion</a></li>
    <li><a href="index.php?page=elephant">Elephant</a></li>
    <li><a href="index.php?page=resonate">Resonate</a></li>
</ul>

<!-- After -->
<ul>
    <li><a href="#">Apple</a></li>
    <li><a href="dandelion.html">Dandelion</a></li>
    <li><a href="elephant.html">Elephant</a></li>
    <li><a href="resonate.html">Resonate</a></li>
</ul>

Script file: 脚本文件：

#! /bin/bash

for f in *.html
do
    sed s/\"index\.php?page=\([.]*\)\"/\1\.html/g < $f >! $f
done

Answer 1

It's your regex, and the fact that the shell is trying to interpret bits of your regex. 这是您的正则表达式，并且外壳程序正在尝试解释您的正则表达式的事实。

First - the [.]* matches any number of literal dots . 首先- [.]*匹配任意数量的文字点. . 。 Change it to .* . 将其更改为.* 。

Secondly, enclose the entire regex in single quotes ' to prevent the bash shell from interpreting any of it. 其次，包围整个正则表达式中的单引号' ，以防止在bash shell解释它的任何。

sed 's/"index\.php?page=\(.*\)"/\1\.html/g'

Also, instead of < $f >! $f 另外，代替< $f >! $f < $f >! $f you can just feed in the '-i' switch to sed to have it operate in-place: < $f >! $f您只需将'-i'开关输入sed即可使其就地运行：

sed -i 's/"index\.php?page=\(.*\)"/"\1\.html"/g' "$f"

(Also, as another point I think in your replacement you want double quotes around the \\1.html so that the new URL is quoted within the HTML. I also quoted your $f to "$f" , because if the file name contains spaces bash will complain). （另外，我想在替换中，您需要在\\1.html周围用双引号引起来，以便在HTML \\1.html新的引号引起来。我还将$f引用为"$f" ，因为如果文件名包含空格bash会抱怨）。

EDIT : as @TimPote notes, the standard way to match something within quotes is either ".*?" 编辑：正如@TimPote所指出的，在引号内匹配内容的标准方法是".*?" (so that the .* is non-greedy) or "[^"]+" . Sed doesn't support the former, so try: （因此.*是非贪婪的）或"[^"]+" 。Sed不支持前者，因此请尝试：

sed -i 's/"index\.php?page=\([^"]\+\)"/"\1\.html"/g' "$f"

This is to prevent (for example) <a href="index.php?page=asdf">"asdf"</a> from being turned into <a href="asdf">"asdf.html"</a> (where the (.*) captured asdf">"asdf , being greedy). 这是为了防止（例如） <a href="index.php?page=asdf">"asdf"</a>变成<a href="asdf">"asdf.html"</a> （其中(.*)捕获asdf">"asdf表示贪婪）。

Answer 2

Your .* was too greedy. 您的.*太贪婪。 Use [^"]\\+ instead. Plus your quotes were all messed up. Surround the whole thing with single quotes instead, then you can use " without escaping them. 使用[^"]\\+代替。加上您的引号都被弄乱了。整个内容都用单引号引起来，然后您可以使用"而不必转义。

sed -i 's/"index\.php?page=\([^"]\+\)"/"\1\.html"/g'

You can do this whole operation with a single statement using find : 您可以使用find使用单个语句完成整个操作：

find . -maxdepth 1 -type f -name '*.html' \
 -exec sed -i 's/"index\.php?page=\([^"]\+\)"/"\1\.html"/g' {} \+

Answer 3

The following works: 以下作品：

 sed "s/\"index\.php?page=\(.*\)\"/\"\1.html\"/g" < 1.html

I think it was mostly the square brackets. 我认为主要是方括号。 Not sure why you had them. 不知道为什么要拥有它们。 Oh, and the entire sed command needs to be in quotes. 哦，整个sed命令需要用引号引起来。

这个shell / sed脚本怎么了？

问题描述

3 个解决方案

解决方案1
4 已采纳 2012-05-17 02:44:39

解决方案2
1 2012-05-17 02:44:25

解决方案3
0 2012-05-17 02:42:24

这个shell / sed脚本怎么了？

问题描述

3 个解决方案

解决方案1 4 已采纳 2012-05-17 02:44:39

解决方案2 1 2012-05-17 02:44:25

解决方案3 0 2012-05-17 02:42:24

解决方案1
4 已采纳 2012-05-17 02:44:39

解决方案2
1 2012-05-17 02:44:25

解决方案3
0 2012-05-17 02:42:24