简体   繁体   English

比较 Bash 中的字母顺序字符串,测试与双括号语法

[英]Comparing strings for alphabetical order in Bash, test vs. double bracket syntax

I am working on a Bash scripting project in which I need to delete one of two files if they have identical content.我正在研究 Bash 脚本项目,如果两个文件具有相同的内容,我需要删除其中一个。 I should delete the one which comes last in an alphabetical sort and in the example output my professor has provided, apple.dat is deleted when the choices are apple.dat and Apple.dat. I should delete the one which comes last in an alphabetical sort and in the example output my professor has provided, apple.dat is deleted when the choices are apple.dat and Apple.dat.

if [[ "apple" > "Apple" ]]; then
    echo apple
else
    echo Apple
fi

prints Apple打印苹果

echo $(echo -e "Apple\napple" | sort | tail -n1)

prints Apple打印苹果

The ASCII value of a is 97 and A is 65, why is the test saying A is greater? a 的 ASCII 值是 97 而 A 是 65,为什么测试说 A 更大?

The weird thing is that I get opposite results with the older syntax:奇怪的是,我使用旧语法得到了相反的结果:

if [ "apple" \> "Apple" ]; then
    echo apple
else
    echo Apple
fi

prints apple打印苹果

and if we try to use the \> in the [[ ]] syntax, it is a syntax error.如果我们尝试在 [[ ]] 语法中使用 \>,这是一个语法错误。

How can we correct this for the double bracket syntax?我们如何纠正这个双括号语法? I have tested this on the school Debian server, my local machine, and my Digital Ocean droplet server.我已经在学校 Debian 服务器、我的本地机器和我的 Digital Ocean Droplet 服务器上测试了这个。 On my local Ubuntu 20.04 and on the school server I get the output described above.在我的本地 Ubuntu 20.04 和学校服务器上,我得到了上述 output。 Interestingly, on my Digital Ocean droplet which is an Ubuntu 20.04 server, I get "apple" with both double and single bracket syntax.有趣的是,在我的 Digital Ocean droplet 上,它是一个 Ubuntu 20.04 服务器,我得到了带有双括号和单括号语法的“apple”。 We are allowed to use either syntax, double bracket or the single bracket actual test call, however I prefer using the newer double bracket syntax and would rather learn how to make this work than to convert my mostly finished script to the older more POSIX compliant syntax.我们可以使用语法、双括号或单括号实际测试调用,但是我更喜欢使用较新的双括号语法,并且宁愿学习如何完成这项工作,而不是将我大部分完成的脚本转换为更旧的更符合 POSIX 的语法.

Hints:提示:

$ (LC_COLLATE=C; if [ "apple" \> "Apple" ]; then echo apple; else echo Apple; fi)
apple
$ (LC_COLLATE=en_US; if [ "apple" \> "Apple" ]; then echo apple; else echo Apple; fi)
apple

but:但:

$ (LC_COLLATE=C; if [[ "apple" > "Apple" ]]; then echo apple; else echo Apple; fi)
apple
$ (LC_COLLATE=en_US; if [[ "apple" > "Apple" ]]; then echo apple; else echo Apple; fi)
Apple

The difference is that the Bash specific test [[ ]] uses the locale collation's rules to compare strings.不同之处在于 Bash 特定测试[[ ]]使用区域设置排序规则来比较字符串。 Whereas the POSIX test [ ] uses the ASCII value.而 POSIX 测试[ ]使用 ASCII 值。

From bash man page:来自 bash 手册页:

When used with [[ , the < and > operators sort lexicographically using the current locale .当与[[一起使用时, <>运算符使用当前语言环境按字典顺序排序。

When used with test or [ , the < and > operators sort lexicographically using ASCII ordering .当与test[一起使用时, <>运算符使用 ASCII 排序按字典顺序排序

I have come up with my own solution to the problem, however I must first thank @GordonDavisson and @LéaGris for their help and for what I have learned from them as that is invaluable to me.我已经想出了自己的解决方案来解决这个问题,但是我必须首先感谢@GordonDavisson 和@LéaGris 的帮助以及我从他们那里学到的东西,因为这对我来说是无价的。

No matter if computer or human locale is used, if, in an alphabetical sort, apple comes after Apple, then it also comes after Banana and if Banana comes after apple, then Apple comes after apple.无论使用计算机还是人类语言环境,如果按字母顺序排序,apple 在 Apple 之后,那么它也在 Banana 之后,如果 Banana 在 apple 之后,那么 Apple 在 apple 之后。 So I have come up with the following:所以我想出了以下几点:

# A function which sorts two words alphabetically with lower case coming after upper case.
# The last word in the sort will be printed twice to demonstrate that this works for both
# the POSIX compliant single bracket test call and the newer double bracket condition
# syntax.
# arg 1: One of two words to sort
# arg 2: One of two words to sort
# Return: 0 upon completion, 1 if incorrect number of args is given
sort_alphabetically() {
    [ $# -ne 2 ] && return 1

    word_1_val=0
    word_2_val=0

    while read -n1 letter; do
        (( word_1_val += $(printf '%d' "'$letter") ))
    done < <(echo -n "$1")

    while read -n1 letter; do
        (( word_2_val += $(printf '%d' "'$letter") ))
    done < <(echo -n "$2")

    if [ $word_1_val -gt $word_2_val ]; then
        echo $1
    else
        echo $2
    fi

    if [[ $word_1_val -gt $word_2_val ]]; then
        echo $1
    else
        echo $2
    fi

    return 0
}

sort_alphabetically "apple" "Apple"
sort_alphabetically "Banana" "apple"
sort_alphabetically "aPPle" "applE"

prints:印刷:

apple
apple
Banana
Banana
applE
applE

This works using process substitution and redirecting the output into the while loop to read one character at a time and then using printf to get the decimal ASCII value of each character.这使用进程替换并将 output 重定向到 while 循环中以一次读取一个字符,然后使用 printf 来获取每个字符的十进制 ASCII 值。 It is like creating a temporary file from the string which will be automatically destroyed and then reading it one character at a time.这就像从字符串创建一个临时文件,该文件将自动销毁,然后一次读取一个字符。 The -n for echo means the \n character, if there is one from user input or something, will be ignored. echo 的 -n 表示 \n 字符,如果有来自用户输入或其他内容的字符,将被忽略。

From bash man pages:来自 bash 手册页:

Process Substitution进程替换

Process substitution allows a process's input or output to be referred to using a filename.进程替换允许使用文件名引用进程的输入或 output。 It takes the form of <(list) or >(list) .它采用<(list)>(list)的形式。 The process list is run asynchronously, and its input or output appears as a filename.进程列表异步运行,其输入或 output 显示为文件名。 This filename is passed as an argument to the current command as the result of the expansion.作为扩展的结果,此文件名作为参数传递给当前命令。 If the >(list) form is used, writing to the file will provide input for list.如果使用>(list)形式,写入文件将为列表提供输入。 If the <(list) form is used, the file passed as an argument should be read to obtain the output of list.如果使用<(list)形式,则应读取作为参数传递的文件以获得列表的 output。 Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of naming open files.支持命名管道 (FIFO) 或命名打开文件的/dev/fd方法的系统支持进程替换。

When available, process substitution is performed simultaneously with parameter and variable expansion, command substitution, and arithmetic expansion.如果可用,进程替换与参数和变量扩展、命令替换和算术扩展同时执行。

from stackoverflow post about printf :来自关于 printf 的stackoverflow帖子

If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote.如果前导字符是单引号或双引号,则该值应为单引号或双引号后字符的基础代码集中的数值。

Note: process substitution is not POSIX compliant, but it is supported by Bash in the way stated in the bash man page.注意:进程替换不符合 POSIX,但 Bash 以 bash 手册页中所述的方式支持它。


UPDATE: The above does not work in all cases!更新:上述内容并非在所有情况下都有效!


The above solution works in many cases however we get some anomalies.上述解决方案在许多情况下都有效,但是我们遇到了一些异常情况。

first word第一个字 second word第二个词 last alphabetically最后按字母顺序
apple苹果 Apple苹果 apple correct苹果correct
Apple苹果 apple苹果 apple correct苹果correct
apPLE苹果 Apple苹果 Apple incorrect苹果incorrect
apple苹果 Banana香蕉 Banana correct香蕉correct
apple苹果 BANANA香蕉 apple incorrect苹果incorrect

The following solution gets the results that are needed:以下解决方案获得所需的结果:

#!/bin/bash

sort_alphabetically() {
    [ $# -ne 2 ] && return 1

    local WORD_1="$1"
    local WORD_2="$2"
    local WORD_1_LOWERED="$(echo -n $1 | tr '[:upper:]' '[:lower:]')"
    local WORD_2_LOWERED="$(echo -n $2 | tr '[:upper:]' '[:lower:]')"

    if [ $(echo -e "$WORD_1\n$WORD_2" | sort | tail -n1) = "$WORD_1" ] ||\
       [ $(echo -e "$WORD_1_LOWERED\n$WORD_2_LOWERED" | sort | tail -n1) =\
         "$WORD_1_LOWERED" ]; then

        if [ "$WORD_1_LOWERED" = "$WORD_2_LOWERED" ]; then

            ASCII_VAL_WORD_1=0
            ASCII_VAL_WORD_2=0
            read -n1 FIRST_CHAR_1 < <(echo -n "$WORD_1")
            read -n1 FIRST_CHAR_2 < <(echo -n "$WORD_2")

            while read -n1 character; do
                (( ASCII_VAL_WORD_1 += $(printf '%d' "'$character") ))
            done < <(echo -n $WORD_1)
            
            while read -n1 character; do
                (( ASCII_VAL_WORD_2 += $(printf '%d' "'$character") ))
            done < <(echo -n $WORD_2)
            
            if [ $ASCII_VAL_WORD_1 -gt $ASCII_VAL_WORD_2 ] &&\
               [ "$FIRST_CHAR_1" \> "$FIRST_CHAR_2" ]; then

                echo "$WORD_1"
            elif [ $ASCII_VAL_WORD_2 -gt $ASCII_VAL_WORD_1 ] &&\
                 [ "$FIRST_CHAR_2" \> "$FIRST_CHAR_1" ]; then

                echo "$WORD_2"
            elif [ "$FIRST_CHAR_1" \> "$FIRST_CHAR_2" ]; then
                echo "$WORD_1"
            else
                echo "$WORD_2"
            fi
        else
            echo "$WORD_1"
        fi
    else
        echo $WORD_2
    fi

    return 0
}

sort_alphabetically "apple" "Apple"
sort_alphabetically "Apple" "apple"
sort_alphabetically "apPLE" "Apple"
sort_alphabetically "Apple" "apPLE"
sort_alphabetically "apple" "Banana"
sort_alphabetically "apple" "BANANA"

exit 0

prints:印刷:

apple
apple
apPLE
apPLE
Banana
BANANA

Change your syntax.改变你的语法。 if [[ "Apple" -gt "apple" ]] works as expected. if [[ "Apple" -gt "apple" ]]按预期工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM