简体繁体 English

Bash脚本：在一行代码中查找字符串，然后在一行中插入

[英]Bash Scripting: Find strings on one line of code and insert on own line

原文 2013-02-10 00:40:30 3 1 string/ bash/ sed/ awk

I'm trying to write a small bash script that: 我正在尝试编写一个小的bash脚本：

-wget's an html file every [x] minutes from the web -wget每隔[x]分钟便是一个html文件
-uses some linux utility to find differences in the file between the last two updates -使用某些linux实用程序查找最近两次更新之间文件中的差异
-Uses sed to modify the lines on which new text was detected -使用sed修改检测到新文本的行

The problem I am running into is that the HTML file uses in-line CSS to format a table, but the actual code for the page is stored on one long line. 我遇到的问题是HTML文件使用嵌入式CSS来格式化表格，但是页面的实际代码却存储在一行上。

Effectively I need a Linux utility that can scan through a single line of code, find every instance of text between each tags, and insert those instances on their own line. 实际上，我需要一个Linux实用程序，该实用程序可以扫描一行代码，找到每个标签之间的每个文本实例，并将这些实例插入自己的行中。 That should make scanning the text easier. 这应该使扫描文本更加容易。 Every tool I've tried searches on a per-line basis which can't do what I need since the entire code is stored on a single line. 我尝试过的每种工具都是按行搜索的，这不能满足我的需要，因为整个代码都存储在一行中。

1 个解决方案

You could first split the content into lines, by substituting (say) > with >\\n . 您可以先用>\\n替换（例如） >将内容分成几行。 That will break up the document on the end of each HTML tag. 这将在每个HTML标记的末尾拆分文档。

Maybe you don't even need to do that: if you use awk's RS variable to define the record separator as ">" instead of newline. 也许您甚至不需要这样做：如果您使用awk的RS变量将记录分隔符定义为“>”而不是换行符。 See this page for an example of using RS: http://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr/ 请参阅此页面以获取使用RS的示例： http : //www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename- fnr /