简体   繁体   English

Bash脚本:在一行代码中查找字符串,然后在一行中插入

[英]Bash Scripting: Find strings on one line of code and insert on own line

I'm trying to write a small bash script that: 我正在尝试编写一个小的bash脚本:

  • -wget's an html file every [x] minutes from the web -wget每隔[x]分钟便是一个html文件
  • -uses some linux utility to find differences in the file between the last two updates -使用某些linux实用程序查找最近两次更新之间文件中的差异
  • -Uses sed to modify the lines on which new text was detected -使用sed修改检测到新文本的行

The problem I am running into is that the HTML file uses in-line CSS to format a table, but the actual code for the page is stored on one long line. 我遇到的问题是HTML文件使用嵌入式CSS来格式化表格,但是页面的实际代码却存储在一行上。

Effectively I need a Linux utility that can scan through a single line of code, find every instance of text between each tags, and insert those instances on their own line. 实际上,我需要一个Linux实用程序,该实用程序可以扫描一行代码,找到每个标签之间的每个文本实例,并将这些实例插入自己的行中。 That should make scanning the text easier. 这应该使扫描文本更加容易。 Every tool I've tried searches on a per-line basis which can't do what I need since the entire code is stored on a single line. 我尝试过的每种工具都是按行搜索的,这不能满足我的需要,因为整个代码都存储在一行中。

You could first split the content into lines, by substituting (say) > with >\\n . 您可以先用>\\n替换(例如) >将内容分成几行。 That will break up the document on the end of each HTML tag. 这将在每个HTML标记的末尾拆分文档。

Maybe you don't even need to do that: if you use awk's RS variable to define the record separator as ">" instead of newline. 也许您甚至不需要这样做:如果您使用awk的RS变量将记录分隔符定义为“>”而不是换行符。 See this page for an example of using RS: http://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr/ 请参阅此页面以获取使用RS的示例: http : //www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename- fnr /

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM