文件中的匹配模式在 bash 中返回多个相同的输出

Question

I'm trying to extract a list of files defined in my .gitattributes file in bash.我正在尝试提取在 bash 中的.gitattributes文件中定义的文件列表。

The .gitattributes file looks like this .gitattributes文件看起来像这样

#
# Exclude these files from release archives.
# This will also make them unavailable when using Composer with `--prefer-dist`.
# https://blog.madewithlove.be/post/gitattributes/
#
/.git export-ignore
/.github export-ignore
/bin export-ignore
/wp-content/themes/**/.storybook export-ignore
/wp-content/themes/**/assets export-ignore
/wp-content/themes/**/storybook export-ignore
/wp-content/themes/**/tests export-ignore
/wp-content/themes/**/.editorconfig export-ignore
/wp-content/themes/**/.env.testing export-ignore
/wp-content/themes/**/.eslintignore export-ignore
/wp-content/themes/**/.eslintrc export-ignore
/wp-content/themes/**/.gitignore export-ignore
/wp-content/themes/**/.stylelintrc export-ignore
/wp-content/themes/**/babel.config.js export-ignore
/wp-content/themes/**/composer.json export-ignore
/wp-content/themes/**/composer.lock export-ignore
/wp-content/themes/**/package.json export-ignore
/wp-content/themes/**/package-lock.json export-ignore
/wp-content/themes/**/phpcs.xml.dist export-ignore
/wp-content/themes/**/phpstan.neon export-ignore
/wp-content/themes/**/phpstan.neon.dist export-ignore
/wp-content/themes/**/postcss.config.js export-ignore
/wp-content/themes/**/webpack.config.js export-ignore
/wp-content/themes/**/CODE_OF_CONDUCT.md export-ignore

composer.lock -diff
yarn.lock -diff
package.lock -diff

#
# Auto detect text files and perform LF normalization
# http://davidlaing.com/2012/09/19/customise-your-gitattributes-to-become-a-git-ninja/
#
* text=auto

#
# The above will handle all files NOT found below
#
*.md text
*.php text
*.inc text

My bash script is inside the bin/ folder, and my .gitattributes is at the root of the project.我的 bash 脚本位于bin/文件夹内，我的.gitattributes位于项目的根目录。

sh bin/test.sh path

The script looks like this脚本看起来像这样

#!/bin/bash

#$1 - current_path variable (root)
file_list=()

while read -r line; do
  if [[ "$line" =~ (\/wp-content\/themes\/\*\*/) ]]; then
    newline=$(echo "$line" | sed 's/ export-ignore//p' | sed 's/\/wp-content\/themes\/\*\*\///p')
    file_list+=("$newline")
  fi
done <"$1"/.gitattributes

echo "${file_list[@]}"

But this will return me multiple duplicated files (four times).但这会给我返回多个重复的文件（四次）。 When I run this I get当我运行这个时，我得到

.storybook
.storybook
.storybook
.storybook assets
assets
assets
assets storybook
storybook
storybook
storybook tests
tests
tests
tests .editorconfig
.editorconfig
.editorconfig
.editorconfig .env.testing
.env.testing
.env.testing
.env.testing .eslintignore
.eslintignore
.eslintignore
.eslintignore .eslintrc
.eslintrc
.eslintrc
.eslintrc .gitignore
.gitignore
.gitignore
.gitignore .stylelintrc
.stylelintrc
.stylelintrc
.stylelintrc babel.config.js
babel.config.js
babel.config.js
babel.config.js composer.json
composer.json
composer.json
composer.json composer.lock
composer.lock
composer.lock
composer.lock package.json
package.json
package.json
package.json package-lock.json
package-lock.json
package-lock.json
package-lock.json phpcs.xml.dist
phpcs.xml.dist
phpcs.xml.dist
phpcs.xml.dist phpstan.neon
phpstan.neon
phpstan.neon
phpstan.neon phpstan.neon.dist
phpstan.neon.dist
phpstan.neon.dist
phpstan.neon.dist postcss.config.js
postcss.config.js
postcss.config.js
postcss.config.js webpack.config.js
webpack.config.js
webpack.config.js
webpack.config.js CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md

Expected output:预期 output：

.storybook
assets
storybook
tests
.editorconfig
.env.testing
.eslintignore
.eslintrc
.gitignore
.stylelintrc
babel.config.js
composer.json
composer.lock
package.json
package-lock.json
phpcs.xml.dist
phpstan.neon
phpstan.neon.dist
postcss.config.js
webpack.config.js
CODE_OF_CONDUCT.md

What am I doing wrong?我究竟做错了什么？

Answer 1

As others will likely point out, there are other (simpler, more efficient) ways to do what the OP is looking to do;正如其他人可能会指出的那样，还有其他（更简单、更有效）的方法来做 OP 想要做的事情； the objective of this answer is to address the behavior of the OP's current sed code.此答案的目的是解决 OP 当前sed代码的行为。

By default sed will pass input through to stdout.默认情况下sed会将输入传递到标准输出。 Consider:考虑：

$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed 's/ export-ignore//'
/wp-content/themes/**/.storybook

By adding the p directive to the sed command you are telling sed to print the result to stdout.通过将p指令添加到sed命令，您告诉sed将结果打印到标准输出。 Consider:考虑：

$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed 's/ export-ignore//p'
/wp-content/themes/**/.storybook
/wp-content/themes/**/.storybook

As you can see we get 2 sets of output... one set due to the normal behavior of sed ... one set due to the additional p directive.如您所见，我们得到了 2 套 output... 一组由于sed的正常行为... 一组由于附加的p指令。

If you want to use the p directive and eliminate the 'duplicate' output you can add the -n (aka --quiet / --silent ) flag which disables sed's default behavior of passing input through to stdout.如果您想使用p指令并消除“重复” output 您可以添加-n （又名--quiet / --silent ）标志，该标志禁用sed's默认行为。 Consider:考虑：

$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed -n 's/ export-ignore//p'
/wp-content/themes/**/.storybook

Because you have 2 sed commands using the p directive, while not using the -n flag, you end up with a total of 4 copies of each matching input (the first sed generating 2 lines of output; the second sed then doubling the output again). Because you have 2 sed commands using the p directive, while not using the -n flag, you end up with a total of 4 copies of each matching input (the first sed generating 2 lines of output; the second sed then doubling the output again ）。

To remove the 'duplicates' there are a couple options:要删除“重复项”，有几个选项：

remove the p directive from both sed commands or...从sed命令中删除p指令或...
add the -n flag to both sed commands将-n标志添加到两个sed命令

Answer 2

This can be done using a simple awk :这可以使用简单的awk来完成：

awk -F/ 'index($0, "/wp-content/themes/") == 1 {sub(/ .*/, "", $NF); print $NF}' .gitattributes

.storybook
assets
storybook
tests
.editorconfig
.env.testing
.eslintignore
.eslintrc
.gitignore
.stylelintrc
babel.config.js
composer.json
composer.lock
package.json
package-lock.json
phpcs.xml.dist
phpstan.neon
phpstan.neon.dist
postcss.config.js
webpack.config.js
CODE_OF_CONDUCT.md

awk Explanation: awk说明：

-F/ : Use / as input field separator -F/ ：使用/作为输入字段分隔符
index($0, "/wp-content/themes/") == 1 : Line start with /wp-content/themes/ only index($0, "/wp-content/themes/") == 1 : 仅以/wp-content/themes/开头的行
sub(/.*/, "", $NF) : Remove anything after space in last field sub(/.*/, "", $NF) ：删除最后一个字段中空格后的任何内容
print $NF : Print last field print $NF : 打印最后一个字段

Answer 3

The quick fix would be: just pipe the output through sort -u:-)快速解决方法是：只需 pipe 和 output 通过 sort -u :-)

The root cause is your usage of the modifier 'p' in the sed regex.根本原因是您在sed正则表达式中使用了修饰符“p”。 This prints out the extra copies.这将打印出额外的副本。 You can just leave it out gnu.org你可以忽略它gnu.org

If you need the results one filename per line, I would make the script如果您需要每行一个文件名的结果，我会制作脚本

while read -r line; do
  if [[ "$line" =~ (\/wp-content\/themes\/\*\*/) ]]; then
    echo "$line" | sed 's/ export-ignore//' | sed 's/\/wp-content\/themes\/\*\*\///'
  fi
done <"$1"/.gitattributes

or, even better, with awk或者，更好的是，使用 awk

< "$1/.gitattributes" awk '
/\/wp-content\/themes\/\*\*\// {
    gsub(/\/wp-content\/themes\/\*\*\//,"");
    gsub(/ export-ignore/,"");
    print $0;
}'

文件中的匹配模式在 bash 中返回多个相同的输出

问题描述

3 个解决方案

解决方案1
4 已采纳 2021-03-27 14:43:57

解决方案2
3 2021-03-27 14:46:50

解决方案3
2 2021-03-27 14:49:30

文件中的匹配模式在 bash 中返回多个相同的输出

问题描述

3 个解决方案

解决方案1 4 已采纳 2021-03-27 14:43:57

解决方案2 3 2021-03-27 14:46:50

解决方案3 2 2021-03-27 14:49:30

解决方案1
4 已采纳 2021-03-27 14:43:57

解决方案2
3 2021-03-27 14:46:50

解决方案3
2 2021-03-27 14:49:30