未终止的地址正则表达式-在bash sed脚本中错误地使用了转义字符？

Question

Just learning sed, and I feel like I'm getting close to doing what I want, just missing something obvious. 只是学习sed，我觉得我快要完成自己想做的事情，只是缺少明显的东西。

The objective is to take bunch of <tr>...</tr> s in an html table and appended it to the single table in another page. 目的是在HTML表格中获取一堆<tr>...</tr>并将其附加到另一页的单个表格中。 So I want to take the initial file, strip everything above the first time I use <tr> and everything from </table> on down, then insert it just above the </table> in the other file. 因此，我想获取初始文件，在第一次使用<tr>所有内容剥离，并从</table>向下剥离所有内容，然后将其插入另一个文件中</table>正上方。 So like below, except <tr> and </tr> are on their own lines, if it matters. 因此，就像下面一样，如果重要的话，除了<tr>和</tr>都是独立的。

Input File:                           Target File:
<html><body>                          <html><body>
  <p>Whatever...</p>                    <p>Other whatever...</p>
  <table>                               <table>
    <tr><td>4</td></tr>                   <thead>
    <tr><td>5</td></tr>                     <tr><th>#</th></tr>
    <tr><td>6</td></tr>                   </thead>
   </table>                               <tbody>
  </body></html>                            <tr><td>1</td></tr>
                                            <tr><td>2</td></tr>
                                            <tr><td>3</td></tr>
                                          </tbody>
                                        </table>
                                      </body></html>

Becomes: 变为：

  Input file                          Target File:
  doesn't matter.                     <html><body>
                                        <p>Other whatever...</p>
                                        <table>
                                          <thead>
                                            <tr><th>#</th></tr>
                                          </thead>
                                          <tbody>
                                            <tr><td>1</td></tr>
                                            <tr><td>2</td></tr>
                                            <tr><td>3</td></tr>
                                            <tr><td>4</td></tr>
                                            <tr><td>5</td></tr>
                                            <tr><td>6</td></tr>
                                          </tbody>
                                        </table>
                                      </body></html>

Here's the code I'm trying to use: 这是我要使用的代码：

#!/bin/bash
#$1 is the first parameter and $2 is the second parameter being passed when calling the script. The variable filename will be used to refer to this.

input=$1
inserttarget=$2

sed -e '/\<\/thead\>,$input' $input
sed -e '/\<\/table\>,$input' $input
sed -n -i -e '\<\/tbody\>/r' $inserttarget -e 1x -e '2,${x;p}' -e '${x;p}' $input

Pretty sure it's pretty simple, just messing the expression up. 可以肯定的是，它很简单，只是弄乱了表达式。 Can anyone set me straight? 谁能让我挺直？

Answer 1

Here I cut the problem in two: 1. Cut the rows from the input 2. Paste those rows in the output file 在这里，我将问题分成两部分：1.剪切输入中的行2.将这些行粘贴到输出文件中

sed -n '\\:<table>:,\\:</table>:p' ${input} | sed -n '\\:<tr>:p'

This line will remove all lines containing <tr> in the block ranging from the first line matching <table> to the first line matching </table> . 该行将删除块中所有包含<tr>的行，从第一行匹配<table>到第一行匹配</table> 。 All those freshly cut lines are printed in the standard output. 所有这些新切割的线都打印在标准输出中。

sed -i '\\:</tbody>: { r /dev/stdin a </tbody> d}' ${inserttarget}

This multi-line command will add the lines read from stdin after the line matching </tbody> . 此多行命令将在匹配</tbody> 之后添加从stdin读取的行。 Then we move the </tbody> by appending it after the new lines and removing the old one. 然后，通过将</tbody>附加在新行之后并删除旧行来移动</tbody> 。

Another trick used here is to replace the default regex delimiter / by : , so that we can use '/' in our matching pattern. 这里使用的另一个技巧是替换默认的正则表达式的分隔符/由: ，这样我们就可以在我们的匹配模式中使用“/”。

Final sotuion : 最终解决方案 ：

sed -i '\:</tbody>: {
r /dev/stdin
a </tbody>
d}' ${inserttarget} < <(sed -n '\:<table>:,\:</table>:p' ${input} | sed -n '\:<tr>:p')

Et voila! 瞧！

未终止的地址正则表达式-在bash sed脚本中错误地使用了转义字符？

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-03-13 22:40:16

未终止的地址正则表达式-在bash sed脚本中错误地使用了转义字符？

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-03-13 22:40:16

解决方案1
0 已采纳 2017-03-13 22:40:16