简体   繁体   English

如何将正则表达式的匹配分配给变量?

[英]How can I assign the match of my regular expression to a variable?

I have a text file with various entries in it. 我有一个包含各种条目的文本文件。 Each entry is ended with line containing all asterisks. 每个条目都以包含所有星号的行结束。

I'd like to use shell commands to parse this file and assign each entry to a variable. 我想使用shell命令来解析此文件并将每个条目分配给变量。 How can I do this? 我怎样才能做到这一点?

Here's an example input file: 这是一个示例输入文件:

***********
Field1
***********
Lorem ipsum
Data to match
***********
More data
Still more data
***********

Here is what my solution looks like so far: 这是我的解决方案到目前为止的样子:

#!/bin/bash
for error in `python example.py | sed -n '/.*/,/^\**$/p'`
do
    echo -e $error
    echo -e "\n"
done

However, this just assigns each word in the matched text to $error, rather than a whole block. 但是,这只是将匹配文本中的每个单词分配给$ error,而不是整个块。

I'm surprised to not see a native bash solution here. 我很惊讶在这里看不到本地bash解决方案。 Yes, bash has regular expressions. 是的,bash有正则表达式。 You can find plenty of random documentation online, particularly if you include "bash_rematch" in your query, or just look at the man pages. 您可以在线找到大量随机文档,特别是如果在查询中包含“bash_rematch”,或者只是查看手册页。 Here's a silly example, taken from here and slightly modified, which prints the whole match, and each of the captured matches, for a regular expression. 这是一个愚蠢的例子,取自这里并略微修改,打印整个匹配,以及每个捕获的匹配,用于正则表达式。

if [[ $str =~ $regex ]]; then
    echo "$str matches"
    echo "matching substring: ${BASH_REMATCH[0]}"
    i=1
    n=${#BASH_REMATCH[*]}
    while [[ $i -lt $n ]]
    do
        echo "  capture[$i]: ${BASH_REMATCH[$i]}"
        let i++
    done
else
    echo "$str does not match"
fi

The important bit is that the extended test [[ ... ]] using its regex comparision =~ stores the entire match in ${BASH_REMATCH[0]} and the captured matches in ${BASH_REMATCH[i]} . 重要的是扩展测试[[ ... ]]使用其正则表达式比较=~将整个匹配存储在${BASH_REMATCH[0]} ,并将捕获的匹配存储在${BASH_REMATCH[i]}

If you want to do it in Bash, you could do something like the following. 如果您想在Bash中执行此操作,则可以执行以下操作。 It uses globbing instead of regexps (The extglob shell option enables extended pattern matching, so that we can match a line consisting only of asterisks.) 它使用globbing而不是regexps( extglob shell选项启用扩展模式匹配,以便我们可以匹配仅由星号组成的行。)

#!/bin/bash
shopt -s extglob
entry=""
while read line
do
    case $line in 
        +(\*))
            # do something with $entry here
            entry=""
            ;;
        *)
            entry="$entry$line
"
            ;;
    esac
done

Try putting double quotes around the command. 尝试在命令周围加上双引号。

#!/bin/bash
for error in "`python example.py | sed -n '/.*/,/^\**$/p'`"
do
    echo -e $error
    echo -e "\n"
done

depending on what you want to do with the variables 取决于你想要对变量做什么

awk '
f && /\*/{print "variable:"s;f=0}
/\*/{ f=1 ;s="";next}
f{
   s=s" "$0
}' file

output: 输出:

# ./test.sh
variable: Field1
variable: Lorem ipsum Data to match
variable: More data Still more data

the above just prints them out. 以上只是将它们打印出来。 if you want, store in array for later use...eg array[++d]=s 如果需要,可以存储在数组中供以后使用...例如array [++ d] = s

Splitting records in (ba)sh is not so easy, but can be done using IFS to split on single characters (simply set IFS='*' before your for loop, but this generates multiple empty records and is problematic if any record contains a '*'). 拆分(ba)sh中的记录并不是那么容易,但是可以使用IFS来拆分单个字符(只需在for循环之前设置IFS ='*',但这会产生多个空记录,如果任何记录包含一个记录,则会出现问题'*')。 The obvious solution is to use perl or awk and use RS to split your records, since those tools provide better mechanisms for splitting records. 显而易见的解决方案是使用perl或awk并使用RS来拆分记录,因为这些工具提供了更好的拆分记录机制。 A hybrid solution is to use perl to do the record splitting, and have perl call your bash function with the record you want. 混合解决方案是使用perl进行记录拆分,并让perl使用您想要的记录调用bash函数。 For example: 例如:

#!/bin/bash

foo() {
    echo record start:
    echo "$@"
    echo record end
}
export -f foo

perl -e "$/='********'; while(<>){chomp;system( \"foo '\$_'\" )}" << 'EOF'
this is a 2-line
record
********
the 2nd record
is 3 lines
long
********
a 3rd * record
EOF

This gives the following output: 这给出了以下输出:

record start:
this is a 2-line
record

record end
record start:

the 2nd record
is 3 lines
long

record end
record start:

a 3rd * record

record end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM