![](/img/trans.png)
[英]awk, sed, grep specific strings from a file in Linux
[英]how to get values from file using sed,awk or grep on linux command/scripting?
我有 file1 的价值:
<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>
我想使用 linux 脚本或命令(sed、grep 或 awk)获取文件的内容。 例如 output:
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%/20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/%2/20Buck%20%282019%30
我的代码:
grep -oP 'href="([^".]*)">([^</.]*)' file1
请帮助我是新手:)
这个
<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>
确实看起来像 HTML 文件。 如果您被允许在您的系统中安装实用程序,我建议您尝试使用 hxselect ,这在您想要提取可以用 CSS 语言描述的内容时很有用。 例如,要从referensi
中获取label
为file.html
的所有column
的内容:
cat file.html | hxselect -i -c -s '\n' column[label=referensi]
$ awk -v RS='<[^>]+>' 'NF{printf "%s", $0 (++c%2?" |":ORS)}' file
stick man (2020)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/
python easy (2019)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/
请注意,正斜杠在您的原始数据中
需要多字符RS
支持(GNU awk)。
使用awk
您可以尝试:
awk -F'>|/<' '{ORS= (NR == 3 || NR == 7) ? " |" : "\n"} $2 != "" {print $2}' file
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30
awk -F'>|/<' '{ORS= (NR%2) ? " |" : RS} $2 != "" {print $2}' file
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.