简体   繁体   English

如何在 shell 脚本中提取字符串的前两个字符?

[英]How can I extract the first two characters of a string in shell scripting?

For example, given:例如,给定:

USCAGoleta9311734.5021-120.1287855805

I want to extract just:我只想提取:

US

Probably the most efficient method, if you're using the bash shell (and you appear to be, based on your comments), is to use the sub-string variant of parameter expansion:如果您使用的是bash shell(根据您的评论,您似乎是),可能最有效的方法是使用参数扩展的子字符串变体:

pax> long="USCAGol.blah.blah.blah"
pax> short="${long:0:2}" ; echo "${short}"
US

This will set short to be the first two characters of long .这会将short设置为long的前两个字符。 If long is shorter than two characters, short will be identical to it.如果long比两个字符短, short将与它相同。

This in-shell method is usually better if you're going to be doing it a lot (like 50,000 times per report as you mention) since there's no process creation overhead.如果您要经常这样做(如您提到的每个报告 50,000 次),这种壳内方法通常会更好,因为没有进程创建开销。 All solutions which use external programs will suffer from that overhead.所有使用外部程序的解决方案都将受到这种开销的影响。

If you also wanted to ensure a minimum length, you could pad it out before hand with something like:如果您还想确保最小长度,您可以事先使用以下内容填充它:

pax> long="A"
pax> tmpstr="${long}.."
pax> short="${tmpstr:0:2}" ; echo "${short}"
A.

This would ensure that anything less than two characters in length was padded on the right with periods (or something else, just by changing the character used when creating tmpstr ).这将确保长度小于两个字符的任何内容都在右侧填充句点(或其他内容,只需更改创建tmpstr时使用的字符)。 It's not clear that you need this but I thought I'd put it in for completeness.目前尚不清楚您是否需要这个,但我想我会把它放进去以保持完整性。


Having said that, there are any number of ways to do this with external programs (such as if you don't have bash available to you), some of which are:话虽如此,有很多方法可以使用外部程序(例如,如果您没有可用的bash )来执行此操作,其中一些是:

short=$(echo "${long}" | cut -c1-2)
short=$(echo "${long}" | head -c2)
short=$(echo "${long}" | awk '{print substr ($0, 0, 2)}'
short=$(echo "${long}" | sed 's/^\(..\).*/\1/')

The first two ( cut and head ) are identical for a single-line string - they basically both just give you back the first two characters.前两个( cuthead )对于单行字符串是相同的 - 它们基本上都只是返回前两个字符。 They differ in that cut will give you the first two characters of each line and head will give you the first two characters of the entire input它们的不同之处在于cut将为您提供每行的前两个字符,而head将为您提供整个输入的前两个字符

The third one uses the awk sub-string function to extract the first two characters and the fourth uses sed capture groups (using () and \1 ) to capture the first two characters and replace the entire line with them.第三个使用awk子字符串函数提取前两个字符,第四个使用sed捕获组(使用()\1 )捕获前两个字符并用它们替换整行。 They're both similar to cut - they deliver the first two characters of each line in the input.它们都类似于cut - 它们提供输入中每行的前两个字符。

None of that matters if you are sure your input is a single line, they all have an identical effect.如果您确定输入是单行,那么这些都不重要,它们都具有相同的效果。

The easiest way is:最简单的方法是:

${string:position:length}

Where this extracts $length substring from $string at $position .这从$position$string中提取$length子字符串。

This is a Bash builtin, so awk or sed is not required.这是内置的 Bash,因此不需要 awk 或 sed。

You've gotten several good answers and I'd go with the Bash builtin myself, but since you asked about sed and awk and ( almost ) no one else offered solutions based on them, I offer you these:你已经得到了几个很好的答案,我自己会使用内置的 Bash,但是由于你询问了sedawk并且(几乎)没有其他人提供基于它们的解决方案,我为你提供了这些:

echo "USCAGoleta9311734.5021-120.1287855805" | awk '{print substr($0,1,2)}'

and

echo "USCAGoleta9311734.5021-120.1287855805" | sed 's/\(^..\).*/\1/'

The awk one ought to be fairly obvious, but here's an explanation of the sed one: awk应该是相当明显的,但这里是对sed的解释:

  • substitute "s/"替换“s/”
  • the group "()" of two of any characters ".." starting at the beginning of the line "^" and followed by any character "."由两个任意字符“..”组成的组“()”,从行首“^”开始,后跟任意字符“.” repeated zero or more times "*" (the backslashes are needed to escape some of the special characters)重复零次或多次“*”(需要反斜杠来转义某些特殊字符)
  • by "/" the contents of the first (and only, in this case) group (here the backslash is a special escape referring to a matching sub-expression)通过“/”表示第一个(在这种情况下也是唯一的)组的内容(这里的反斜杠是指匹配子表达式的特殊转义)
  • done "/"完毕 ”/”

只需 grep:

echo 'abcdef' | grep -Po "^.."        # ab

If you're in bash , you can say:如果你在bash中,你可以说:

bash-3.2$ var=abcd
bash-3.2$ echo ${var:0:2}
ab

This may be just what you need…这可能正是您所需要的……

You can use printf :您可以使用printf

$ original='USCAGoleta9311734.5021-120.1287855805'
$ printf '%-.2s' "$original"
US

If you want to use shell scripting and not rely on non-posix extensions (such as so-called bashisms), you can use techniques that do not require forking external tools such as grep, sed, cut, awk, etc., which then make your script less efficient.如果你想使用 shell 脚本而不依赖于非 posix 扩展(例如所谓的 bashisms),你可以使用不需要分叉外部工具的技术,例如 grep、sed、cut、awk 等,然后使您的脚本效率降低。 Maybe efficiency and posix portability is not important in your use case.也许效率和 posix 可移植性在您的用例中并不重要。 But in case it is (or just as a good habit), you can use the following parameter expansion option method to extract the first two characters of a shell variable:但如果是这样(或者只是作为一个好习惯),您可以使用以下参数扩展选项方法来提取 shell 变量的前两个字符:

$ sh -c 'var=abcde; echo "${var%${var#??}}"'
ab

This uses "smallest prefix" parameter expansion to remove the first two characters (this is the ${var#??} part), then "smallest suffix" parameter expansion (the ${var% part) to remove that all-but-the-first-two-characters string from the original value.这使用“最小前缀”参数扩展来删除前两个字符(这是${var#??}部分),然后使用“最小后缀”参数扩展${var%部分)来删除所有-但-原始值中的 the-first-two-characters 字符串。

This method was previously described in this answer to the "Shell = Check if variable begins with #" question.此方法之前已在“Shell = 检查变量是否以 # 开头”问题的答案中进行了描述。 That answer also describes a couple similar parameter expansion methods that can be used in a slightly different context that the one that applies to the original question here.该答案还描述了一些类似的参数扩展方法,可以在与此处适用于原始问题的上下文略有不同的上下文中使用。

colrm — remove columns from a file colrm — 从文件中删除列

To leave first two chars, just remove columns starting from 3要保留前两个字符,只需删除从 3 开始的列

cat file | colrm 3

Use:利用:

sed 's/.//3g'

Or或者

awk NF=1 FPAT=..

Or或者

perl -pe '$_=unpack a2'

Just for the sake of fun Ill add a few that, although they are over complicated and useless, they were not mentioned :只是为了好玩我会添加一些,虽然它们过于复杂和无用,但没有提到它们:

head -c 2 <( echo 'USCAGoleta9311734.5021-120.1287855805')

echo 'USCAGoleta9311734.5021-120.1287855805' | dd bs=2 count=1 status=none

sed -e 's/^\(.\{2\}\).*/\1/;' <( echo 'USCAGoleta9311734.5021-120.1287855805')

cut -c 1-2 <( echo 'USCAGoleta9311734.5021-120.1287855805')

python -c "print(r'USCAGoleta9311734.5021-120.1287855805'[0:2])"

ruby -e 'puts "USCAGoleta9311734.5021-120.1287855805"[0..1]'

If your system is using a different shell (not bash ), but your system has bash , then you can still use the inherent string manipulation of bash by invoking bash with a variable:如果您的系统使用不同的 shell(不是bash ),但您的系统有bash ,那么您仍然可以通过使用变量调用bash来使用bash的固有字符串操作:

strEcho='echo ${str:0:2}' # '${str:2}' if you want to skip the first two characters and keep the rest
bash -c "str=\"$strFull\";$strEcho;"

This may be what you're after:这可能是你所追求的:

my $string = 'USCAGoleta9311734.5021-120.1287855805';

my $first_two_chars = substr $string, 0, 2;

Reference: substr参考: substr

How to consider Unicode + UTF-8如何考虑 Unicode + UTF-8

Let's do a quick test for those interested in Unicode characters rather than just bytes.让我们为那些对 Unicode 字符而不是字节感兴趣的人做一个快速测试。 Each character of áéíóú ( acute accented vowels ) is made up of two bytes in UTF-8. áéíóú重音元音)的每个字符都由 UTF-8 中的两个字节组成。 With:和:

printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 awk '{print substr($0,1,3);exit}'
printf 'áéíóú' | LC_CTYPE=C awk '{print substr($0,1,3);exit}'
printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 head -c3
echo
printf 'áéíóú' | LC_CTYPE=C head -c3

we get:我们得到:

áéí
á
á
á

so we see that only awk + LC_CTYPE=en_US.UTF-8 considered the UTF-8 characters.所以我们看到只有awk + LC_CTYPE=en_US.UTF-8考虑了 UTF-8 字符。 The other approaches took only three bytes.其他方法只占用三个字节。 We can confirm that with:我们可以通过以下方式确认:

printf 'áéíóú' | LC_CTYPE=C head -c3 | hd

which gives:这使:

00000000  c3 a1 c3                                          |...|
00000003

and the c3 by itself is trash, and does not show up on the terminal, so we saw only á .c3本身就是垃圾,不会出现在终端上,所以我们只看到了á

awk + LC_CTYPE=en_US.UTF-8 actually returns 6 bytes however.然而, awk + LC_CTYPE=en_US.UTF-8实际上返回 6 个字节。

We could also have equivalently tested with:我们也可以等效地测试:

printf '\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba' | LC_CTYPE=en_US.UTF-8 awk '{print substr($0,1,3);exit}'

and if you want a general parameter:如果你想要一个通用参数:

n=3
printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 awk "{print substr(\$0,1,$n);exit}"

Question more specific about Unicode + UTF-8: https://superuser.com/questions/450303/unix-tool-to-output-first-n-characters-in-an-utf-8-encoded-file有关 Unicode + UTF-8 的更具体问题: https ://superuser.com/questions/450303/unix-tool-to-output-first-n-characters-in-an-utf-8-encoded-file

Related: https://unix.stackexchange.com/questions/3454/grabbing-the-first-x-characters-for-a-string-from-a-pipe相关: https ://unix.stackexchange.com/questions/3454/grabbing-the-first-x-characters-for-a-string-from-a-pipe

Tested on Ubuntu 21.04.在 Ubuntu 21.04 上测试。

The code编码

if mystring = USCAGoleta9311734.5021-120.1287855805

    print substr(mystring,0,2)

would print US.将打印美国。

Where 0 is the start position and 2 is how many characters to read.其中 0 是起始位置,2 是要读取的字符数。

perl -ple 's/^(..).*/$1/'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从Shell中的字符串中提取第一个字符? - How to extract first character from a string in Shell? SHELL:提取不同状态下两个字符之间的字符串 - SHELL :Extract a string betwen two characters in different states 如何在bash shell脚本中将字符串的一部分提取到变量中? - How can I extract a portion of a string into a variable in a bash shell script? 如何通过shell脚本提取部分字符串? - How can I extract part of a string via a shell script? 如何转义Python字符串中的任何特殊shell字符? - How can I escape any of the special shell characters in a Python string? 在UNIX shell脚本中的任何特殊字符之前提取数字或单词 - extract numbers or words before any special characters in unix shell scripting 如何检查 Bash 或 Unix shell 中字符串的第一个字符? - How can I check the first character in a string in Bash or Unix shell? 如何在第二次出现的两个单词之间提取文本 - unix shell 脚本 - How to extract text between two words in second occurrence - unix shell scripting Linux Shell:如何删除字符串中特定字符之前/之后的前导/结尾字符? - Linux shell: How can I remove leading/trailing characters before/after specific characters inside a string? 如何在Shell脚本中使用awk命令减去也包含字符的列中的整数? - how can the awk command in Shell scripting be used to subtract integers in columns that also contain characters?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM