简体   繁体   English

使用sed提取带有特殊字符的两个单词之间的整个字符串

[英]Extracting whole string between two words with special characters using sed

I have a file with each line in the format: 我有一个文件,每一行的格式为:

<tr><td>20456712 </td><td>Alin Smith </td><td.....(and so on).

I want to return all studens names and their ID's in lines, output should be: 我想在行中返回所有学生姓名及其ID,输出应为:

20456712 Alin Smith

..... .....

How can I do it with sed/grep?. 我该如何使用sed / grep?

I've tried many things to get whats between <tr><td> and </td><td> but nothing worked because of the special characters I think. 我已经尝试了很多方法来获取<tr><td></td><td>但是由于我认为的特殊字符而没有任何效果。

I'm already a few days trying with no results. 我已经尝试了几天,但没有结果。

I've tried => sed -r 's/.*[<]+tr+[>]+[<]+td+[>](\\S+).* <\\/td><td>(\\S+).*/\\1 \\2/' , but it only gave me ID + the first name: 20456712 Alin 我试过=> sed -r 's/.*[<]+tr+[>]+[<]+td+[>](\\S+).* <\\/td><td>(\\S+).*/\\1 \\2/' ,但它只给我ID +名字: 20456712 Alin

You could try this RegEx: 您可以尝试以下RegEx:

<tr><td>([\d\s]+)<\/td><td>([\w\s]+)<\/td>

All the data will be stored in Group 1 (the ID) and Group 2 (the Full Name). 所有数据将存储在组1 (ID)和组2 (全名)中。 You can see this in the demo by hovering over the match and checing the data in both groups 您可以将鼠标悬停在比赛上并查看两组数据,从而在演示中看到这一点

Live Demo on RegExr RegExr上的实时演示


How it works: 这个怎么运作:

<tr>         # Opening <tr>
<td>         # Opening <td>
([\d\s]+)    # ID
<\/td>       # Closing </td>
<td>         # Opening <td>
([\w\s]+)    # Full Name
<\/td>       # Closing </td>

This should also help you out: 这也应该可以帮助您:

sed 's/.*\([0-9]\{8\}\) <\/td><td>\([^<]*\) .*/\1 \2/'

View test on the command line 在命令行上查看测试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM