简体   繁体   English

在bash的每一行中找到连续的以空格分隔的单个字符

[英]Find consecutive space delimited single characters in each line in bash

Let say I have the following file 假设我有以下文件

Y M C A
cambridge m a
d m v office
t mobile

and want to convert it to 并希望将其转换为

YMCA
cambridge ma
dmv office
t mobile

that is to detect all consecutive single characters followed by single space of different lengths ( greater than two). 即检测所有连续的单个字符,然后检测不同长度(大于两个)的单个空格。 For example, the item 'dmv office', we should detect 'dm v' and convert it to 'dmv' but would leave 't mobile store' intact (only one single character). 例如,对于“ dmv office”项目,我们应该检测到“ dm v”并将其转换为“ dmv”,但将使“ t mobile store”保持完整(仅一个字符)。
Is it possible to do this in bash or I have to use a program like python to do this? 是否可以在bash中执行此操作,或者我必须使用python之类的程序来执行此操作?

Perl one-liner: Perl一线:

echo 'Y M C A' | perl -ple's/\b\w\K\s(?=\w\b)//g'
==> YMCA

echo 't mobile' | perl -ple's/\b\w\K\s(?=\w\b)//g'
==> t mobile

This replaces a space when surrounded by a single word character. 当被单个单词字符包围时,它将替换空格。 You can replace \\w by [a-zA-Z] if it's more convenient for you. 如果更方便,可以用[a-zA-Z]替换\\w

This sed one-liner works for given example: sed单行代码适用于以下示例:

sed -r 's/ (\S\S)/_\1/g;s/(\S\S) /\1_/g;s/ //g;s/_/ /g' file

Test with your data: 测试您的数据:

kent$  sed -r 's/ (\S\S)/_\1/g;s/(\S\S) /\1_/g;s/ //g;s/_/ /g' f   
YMCA
cambridge ma
dmv office
t mobile

I used a place holder in above line, the _ , if your text has already _ , you can use \\x99 , in visible char. 我在上面的行_使用了占位符,如果您的文本已经_ ,则可以在可见字符中使用\\x99

With any awk in any shell on any UNIX system: 在任何UNIX系统上的任何shell中使用任何awk:

$ awk '{out=$1; for (i=2;i<=NF;i++) {out = out (length($(i-1)$i)==2 ? "" : OFS) $i} print out}' file
YMCA
cambridge ma
dmv office
t mobile

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM