简体   繁体   English

有没有办法使用 sed 删除文件的每一行之前的所有内容并包括一个制表符(或空格)?

[英]Is there a way to remove everything before and including a tab (or space) for each line of a file using sed?

I have a file where I want to remove everything before and including the first space for each line.我有一个文件,我想在其中删除所有内容,包括每行的第一个空格。 For example, if my file looks like this:例如,如果我的文件如下所示:

>JQ907469.1 Gracilariopsis mclachlanii voucher BG0072 23S ribosomal RNA gene, partial sequence; plastid
>JQ907467.1 Gracilariopsis longissima voucher BG0052 23S ribosomal RNA gene, partial sequence; plastid
>JQ907456.1 Hydropuntia rangiferina voucher BG0092 23S ribosomal RNA gene, partial sequence; plastid
>JQ907428.1 Gracilaria cornea voucher BG0112 23S ribosomal RNA gene, partial sequence; plastid
>JQ952662.1 Gracilariopsis tenuifrons voucher BG0042 23S ribosomal RNA gene, partial sequence; plastid

I want it to look like this我希望它看起来像这样

Gracilariopsis mclachlanii voucher BG0072 23S ribosomal RNA gene, partial sequence; plastid
Gracilariopsis longissima voucher BG0052 23S ribosomal RNA gene, partial sequence; plastid
Hydropuntia rangiferina voucher BG0092 23S ribosomal RNA gene, partial sequence; plastid
Gracilaria cornea voucher BG0112 23S ribosomal RNA gene, partial sequence; plastid
Gracilariopsis tenuifrons voucher BG0042 23S ribosomal RNA gene, partial sequence; plastid

I assume I can use sed to achieve my goal, but I'm not familiar enough with the notation and syntax of it yet to experiment.我假设我可以使用 sed 来实现我的目标,但我对它的符号和语法还不够熟悉,还没有进行实验。 In the spirit of that, I'd love it if someone has a solution if they could explain why the code works the way it does.本着这种精神,如果有人有解决方案,如果他们能解释为什么代码会这样工作,我会很高兴的。

Cheers干杯

Employing a regex, and assuming you're using a reasonably current GNU sed:使用正则表达式,并假设您使用的是当前合理的 GNU sed:

sed -r 's/[^ \t]+[ \t]//' yourfile

If you're happy with how that looks, make that如果你对它的外观感到满意,那就做吧

sed -i -r 's/[^ \t]+[ \t]//' yourfile

How does it work?它是如何工作的? s/ starts a search & replace s/开始搜索和替换

^[^ \\t]+[ \\t] is a regular expression that translates to from the beginning of line match all non-space (or TAB) characters and the first space (or TAB) ^[^ \\t]+[ \\t]是一个正则表达式,它转换为从行首匹配所有非空格(或制表符)字符和第一个空格(或制表符)

// the slashes, and the one above in the first part of the command, s/ , are separators. //斜杠和命令第一部分中的斜杠s/是分隔符。 The bit between the first two is the search pattern, the bit between the second two is the replacement (in your case, nothing).前两个之间的位是搜索模式,后两个之间的位是替换(在您的情况下,没有)。

-r tells GNU sed to use enhanced regular expression syntax. -r告诉 GNU sed 使用增强的正则表达式语法。

-i tells it to modify the file in place. -i告诉它就地修改文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM