简体   繁体   中英

break down text file into an organized list using AWK or SED etc

What's the best way to split this text file into an organized and readable file?

The text file I'm working with is in the following format after deleting all lines that do NOT contain the strings JUNIOR or SENIOR:

<tr><td><a href="campers_SENIOR/head_unit">head_unit_1</a></td></tr>
<tr><td><a href="campers_JUNIOR/head_unit">head_unit_2</a></td></tr>
<tr><td><a href="campers_SENIOR/secondary_unit">secondary_unit_1</a></td></tr>
<tr><td><a href="campers_JUNIOR/secondary_unit">secondary_unit_2</a></td></tr>

I want the output as:

Unit Type: SENIOR
Unit Tier: head_unit
File Name: head_unit_1

Unit Type: SENIOR
Unit Tier: secondary_unit
File Name: secondary_unit_1

Unit Type: JUNIOR
Unit Tier: head_unit
File Name: head_unit_2

Unit Type: JUNIOR
Unit Tier: secondary_unit
File Name: secondary_unit_2

I've been trying to use a mixture of SED and AWK to achieve this. My problem is that I'm not sure how to scope this down into JUNIOR and SENIOR sections to better get at the file names and unit tiers. Please try to stick to SED and AWK solutions as these will make the most sense and won't be too involved.

If your input is relatively well formed* then setting the field separator to [/"<> ]+ will pull out the information you need:

$ awk -F'[/"<> ]+' '{sub("campers_", "", $6); print $6, $7, $8}' file
SENIOR head_unit head_unit_1
JUNIOR head_unit head_unit_2
SENIOR secondary_unit secondary_unit_1
JUNIOR secondary_unit secondary_unit_2

From there it is trivial to form each record as required.


*If your actual input is not as well formed as in your excerpt you will need to use a proper HTML parser.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM