简体   繁体   English

未终止的地址正则表达式-在bash sed脚本中错误地使用了转义字符?

[英]Unterminated address regex - misapplying escape characters in bash sed script?

Just learning sed, and I feel like I'm getting close to doing what I want, just missing something obvious. 只是学习sed,我觉得我快要完成自己想做的事情,只是缺少明显的东西。

The objective is to take bunch of <tr>...</tr> s in an html table and appended it to the single table in another page. 目的是在HTML表格中获取一堆<tr>...</tr>并将其附加到另一页的单个表格中。 So I want to take the initial file, strip everything above the first time I use <tr> and everything from </table> on down, then insert it just above the </table> in the other file. 因此,我想获取初始文件,在第一次使用<tr>所有内容剥离,并从</table>向下剥离所有内容,然后将其插入另一个文件中</table>正上方。 So like below, except <tr> and </tr> are on their own lines, if it matters. 因此,就像下面一样,如果重要的话,除了<tr></tr>都是独立的。

Input File:                           Target File:
<html><body>                          <html><body>
  <p>Whatever...</p>                    <p>Other whatever...</p>
  <table>                               <table>
    <tr><td>4</td></tr>                   <thead>
    <tr><td>5</td></tr>                     <tr><th>#</th></tr>
    <tr><td>6</td></tr>                   </thead>
   </table>                               <tbody>
  </body></html>                            <tr><td>1</td></tr>
                                            <tr><td>2</td></tr>
                                            <tr><td>3</td></tr>
                                          </tbody>
                                        </table>
                                      </body></html>

Becomes: 变为:

  Input file                          Target File:
  doesn't matter.                     <html><body>
                                        <p>Other whatever...</p>
                                        <table>
                                          <thead>
                                            <tr><th>#</th></tr>
                                          </thead>
                                          <tbody>
                                            <tr><td>1</td></tr>
                                            <tr><td>2</td></tr>
                                            <tr><td>3</td></tr>
                                            <tr><td>4</td></tr>
                                            <tr><td>5</td></tr>
                                            <tr><td>6</td></tr>
                                          </tbody>
                                        </table>
                                      </body></html>

Here's the code I'm trying to use: 这是我要使用的代码:

#!/bin/bash
#$1 is the first parameter and $2 is the second parameter being passed when calling the script. The variable filename will be used to refer to this.

input=$1
inserttarget=$2

sed -e '/\<\/thead\>,$input' $input
sed -e '/\<\/table\>,$input' $input
sed -n -i -e '\<\/tbody\>/r' $inserttarget -e 1x -e '2,${x;p}' -e '${x;p}' $input

Pretty sure it's pretty simple, just messing the expression up. 可以肯定的是,它很简单,只是弄乱了表达式。 Can anyone set me straight? 谁能让我挺直?

Here I cut the problem in two: 1. Cut the rows from the input 2. Paste those rows in the output file 在这里,我将问题分成两部分:1.剪切输入中的行2.将这些行粘贴到输出文件中

  1. sed -n '\\:<table>:,\\:</table>:p' ${input} | sed -n '\\:<tr>:p'

This line will remove all lines containing <tr> in the block ranging from the first line matching <table> to the first line matching </table> . 该行将删除块中所有包含<tr>的行,从第一行匹配<table>到第一行匹配</table> All those freshly cut lines are printed in the standard output. 所有这些新切割的线都打印在标准输出中。

  1. sed -i '\\:</tbody>: { r /dev/stdin a </tbody> d}' ${inserttarget}

This multi-line command will add the lines read from stdin after the line matching </tbody> . 此多行命令将匹配</tbody> 之后添加从stdin读取的行。 Then we move the </tbody> by appending it after the new lines and removing the old one. 然后,通过将</tbody>附加新行之后并删除旧行来移动</tbody>

Another trick used here is to replace the default regex delimiter / by : , so that we can use '/' in our matching pattern. 这里使用的另一个技巧是替换默认的正则表达式的分隔符/: ,这样我们就可以在我们的匹配模式中使用“/”。

Final sotuion : 最终解决方案

sed -i '\:</tbody>: {
r /dev/stdin
a </tbody>
d}' ${inserttarget} < <(sed -n '\:<table>:,\:</table>:p' ${input} | sed -n '\:<tr>:p')

Et voila! 瞧!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM