简体   繁体   English

bash中的cut命令终止于引号

[英]cut command in bash terminating on quotation marks

So I am trying to read in a file that has a bunch of lines with an email address and then a nickname in them. 因此,我试图读取一个文件,该文件包含一串带有电子邮件地址的行,然后是它们的昵称。 I am trying to extract this nickname, which is surrounded by parentheses, like below 我正在尝试提取这个昵称,该昵称用括号括起来,如下所示

email@somewhere.com (Tom)

so my thought was just to use cut to get at the word Tom , but this is foiled when I end up with something like the following 所以我的想法仅仅是使用cut来获得Tom这个词,但是当我最终得到类似以下内容的东西时,这被挫败了

email2@somewhereElse.com ("Bob")

Because Bob has quotes around it, the cut command fails as follows 由于Bob周围带有引号,所以cut命令失败,如下所示

cut: <file>: Illegal byte sequence

Does anyone know of a better way of doing this? 有人知道这样做的更好方法吗? or a way to solve this problem? 或解决此问题的方法?

Reset your locale to C (raw uninterpreted byte sequence) to avoid Illegal byte sequence errors. 将您的locale重置为C (原始的未解释字节序列),以避免Illegal byte sequence错误。

locale charmap
LC_ALL=C cut ... | LC_ALL=C sort ...

I think that 我觉得

grep -o '(.*)' emailFile 

should do it. 应该这样做。 "Go through all lines in the file. Look for a sequence that starts with open parens, then any characters until close parens. Echo the bit that matches the string to stdout." “遍历文件中的所有行。寻找一个以开括号开头的序列,然后是任何字符,直到闭括号为止。将与字符串匹配的位回显到stdout。”

This preserves the quotes around the nickname... as well as the brackets. 这样会保留昵称周围的引号...以及方括号。 If you don't want those, you can strip them: 如果您不想要这些,可以将它们剥离:

grep -o '(.*)' emailFile | sed 's/[(")]//g'

("replace any of the characters between square brackets with nothing, everywhere") (“将方括号之间的任何字符替换为无处无处”)

perl -lne '$_=~/[^\(]*\(([^)]*)\)/g;print $1'

在这里测试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM