简体   繁体   English

使用AWK或SED等将文本文件分解为有组织的列表

[英]break down text file into an organized list using AWK or SED etc

What's the best way to split this text file into an organized and readable file? 将文本文件拆分为有组织的可读文件的最佳方法是什么?

The text file I'm working with is in the following format after deleting all lines that do NOT contain the strings JUNIOR or SENIOR: 删除所有不包含字符串JUNIOR或SENIOR的行后,我正在使用的文本文件采用以下格式:

<tr><td><a href="campers_SENIOR/head_unit">head_unit_1</a></td></tr>
<tr><td><a href="campers_JUNIOR/head_unit">head_unit_2</a></td></tr>
<tr><td><a href="campers_SENIOR/secondary_unit">secondary_unit_1</a></td></tr>
<tr><td><a href="campers_JUNIOR/secondary_unit">secondary_unit_2</a></td></tr>

I want the output as: 我希望输出为:

Unit Type: SENIOR
Unit Tier: head_unit
File Name: head_unit_1

Unit Type: SENIOR
Unit Tier: secondary_unit
File Name: secondary_unit_1

Unit Type: JUNIOR
Unit Tier: head_unit
File Name: head_unit_2

Unit Type: JUNIOR
Unit Tier: secondary_unit
File Name: secondary_unit_2

I've been trying to use a mixture of SED and AWK to achieve this. 我一直在尝试使用SED和AWK的混合物来实现这一目标。 My problem is that I'm not sure how to scope this down into JUNIOR and SENIOR sections to better get at the file names and unit tiers. 我的问题是我不确定如何将其分为JUNIOR和SENIOR部分,以便更好地了解文件名和单位层。 Please try to stick to SED and AWK solutions as these will make the most sense and won't be too involved. 请尝试坚持使用SED和AWK解决方案,因为这些解决方案将是最有意义的,而且不会涉及太多。

If your input is relatively well formed* then setting the field separator to [/"<> ]+ will pull out the information you need: 如果您的输入格式相对正确*,则将字段分隔符设置为[/"<> ]+将会提取您需要的信息:

$ awk -F'[/"<> ]+' '{sub("campers_", "", $6); print $6, $7, $8}' file
SENIOR head_unit head_unit_1
JUNIOR head_unit head_unit_2
SENIOR secondary_unit secondary_unit_1
JUNIOR secondary_unit secondary_unit_2

From there it is trivial to form each record as required. 从那里开始,根据需要形成每个记录很简单。


*If your actual input is not as well formed as in your excerpt you will need to use a proper HTML parser. *如果实际输入的格式不如摘录中的格式,则需要使用适当的HTML解析器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM