[英]Matching patterns from a file returns multiple same outputs in bash
I'm trying to extract a list of files defined in my .gitattributes
file in bash.我正在尝试提取在 bash 中的
.gitattributes
文件中定义的文件列表。
The .gitattributes
file looks like this .gitattributes
文件看起来像这样
#
# Exclude these files from release archives.
# This will also make them unavailable when using Composer with `--prefer-dist`.
# https://blog.madewithlove.be/post/gitattributes/
#
/.git export-ignore
/.github export-ignore
/bin export-ignore
/wp-content/themes/**/.storybook export-ignore
/wp-content/themes/**/assets export-ignore
/wp-content/themes/**/storybook export-ignore
/wp-content/themes/**/tests export-ignore
/wp-content/themes/**/.editorconfig export-ignore
/wp-content/themes/**/.env.testing export-ignore
/wp-content/themes/**/.eslintignore export-ignore
/wp-content/themes/**/.eslintrc export-ignore
/wp-content/themes/**/.gitignore export-ignore
/wp-content/themes/**/.stylelintrc export-ignore
/wp-content/themes/**/babel.config.js export-ignore
/wp-content/themes/**/composer.json export-ignore
/wp-content/themes/**/composer.lock export-ignore
/wp-content/themes/**/package.json export-ignore
/wp-content/themes/**/package-lock.json export-ignore
/wp-content/themes/**/phpcs.xml.dist export-ignore
/wp-content/themes/**/phpstan.neon export-ignore
/wp-content/themes/**/phpstan.neon.dist export-ignore
/wp-content/themes/**/postcss.config.js export-ignore
/wp-content/themes/**/webpack.config.js export-ignore
/wp-content/themes/**/CODE_OF_CONDUCT.md export-ignore
composer.lock -diff
yarn.lock -diff
package.lock -diff
#
# Auto detect text files and perform LF normalization
# http://davidlaing.com/2012/09/19/customise-your-gitattributes-to-become-a-git-ninja/
#
* text=auto
#
# The above will handle all files NOT found below
#
*.md text
*.php text
*.inc text
My bash script is inside the bin/
folder, and my .gitattributes
is at the root of the project.我的 bash 脚本位于
bin/
文件夹内,我的.gitattributes
位于项目的根目录。
sh bin/test.sh path
The script looks like this脚本看起来像这样
#!/bin/bash
#$1 - current_path variable (root)
file_list=()
while read -r line; do
if [[ "$line" =~ (\/wp-content\/themes\/\*\*/) ]]; then
newline=$(echo "$line" | sed 's/ export-ignore//p' | sed 's/\/wp-content\/themes\/\*\*\///p')
file_list+=("$newline")
fi
done <"$1"/.gitattributes
echo "${file_list[@]}"
But this will return me multiple duplicated files (four times).但这会给我返回多个重复的文件(四次)。 When I run this I get
当我运行这个时,我得到
.storybook
.storybook
.storybook
.storybook assets
assets
assets
assets storybook
storybook
storybook
storybook tests
tests
tests
tests .editorconfig
.editorconfig
.editorconfig
.editorconfig .env.testing
.env.testing
.env.testing
.env.testing .eslintignore
.eslintignore
.eslintignore
.eslintignore .eslintrc
.eslintrc
.eslintrc
.eslintrc .gitignore
.gitignore
.gitignore
.gitignore .stylelintrc
.stylelintrc
.stylelintrc
.stylelintrc babel.config.js
babel.config.js
babel.config.js
babel.config.js composer.json
composer.json
composer.json
composer.json composer.lock
composer.lock
composer.lock
composer.lock package.json
package.json
package.json
package.json package-lock.json
package-lock.json
package-lock.json
package-lock.json phpcs.xml.dist
phpcs.xml.dist
phpcs.xml.dist
phpcs.xml.dist phpstan.neon
phpstan.neon
phpstan.neon
phpstan.neon phpstan.neon.dist
phpstan.neon.dist
phpstan.neon.dist
phpstan.neon.dist postcss.config.js
postcss.config.js
postcss.config.js
postcss.config.js webpack.config.js
webpack.config.js
webpack.config.js
webpack.config.js CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
Expected output:预期 output:
.storybook
assets
storybook
tests
.editorconfig
.env.testing
.eslintignore
.eslintrc
.gitignore
.stylelintrc
babel.config.js
composer.json
composer.lock
package.json
package-lock.json
phpcs.xml.dist
phpstan.neon
phpstan.neon.dist
postcss.config.js
webpack.config.js
CODE_OF_CONDUCT.md
What am I doing wrong?我究竟做错了什么?
As others will likely point out, there are other (simpler, more efficient) ways to do what the OP is looking to do;正如其他人可能会指出的那样,还有其他(更简单、更有效)的方法来做 OP 想要做的事情; the objective of this answer is to address the behavior of the OP's current
sed
code.此答案的目的是解决 OP 当前
sed
代码的行为。
By default sed
will pass input through to stdout.默认情况下
sed
会将输入传递到标准输出。 Consider:考虑:
$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed 's/ export-ignore//'
/wp-content/themes/**/.storybook
By adding the p
directive to the sed
command you are telling sed
to print the result to stdout.通过将
p
指令添加到sed
命令,您告诉sed
将结果打印到标准输出。 Consider:考虑:
$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed 's/ export-ignore//p'
/wp-content/themes/**/.storybook
/wp-content/themes/**/.storybook
As you can see we get 2 sets of output... one set due to the normal behavior of sed
... one set due to the additional p
directive.如您所见,我们得到了 2 套 output... 一组由于
sed
的正常行为... 一组由于附加的p
指令。
If you want to use the p
directive and eliminate the 'duplicate' output you can add the -n
(aka --quiet
/ --silent
) flag which disables sed's
default behavior of passing input through to stdout.如果您想使用
p
指令并消除“重复” output 您可以添加-n
(又名--quiet
/ --silent
)标志,该标志禁用sed's
默认行为。 Consider:考虑:
$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed -n 's/ export-ignore//p'
/wp-content/themes/**/.storybook
Because you have 2 sed
commands using the p
directive, while not using the -n
flag, you end up with a total of 4 copies of each matching input (the first sed
generating 2 lines of output; the second sed
then doubling the output again). Because you have 2
sed
commands using the p
directive, while not using the -n
flag, you end up with a total of 4 copies of each matching input (the first sed
generating 2 lines of output; the second sed
then doubling the output again )。
To remove the 'duplicates' there are a couple options:要删除“重复项”,有几个选项:
p
directive from both sed
commands or...sed
命令中删除p
指令或...-n
flag to both sed
commands-n
标志添加到两个sed
命令This can be done using a simple awk
:这可以使用简单的
awk
来完成:
awk -F/ 'index($0, "/wp-content/themes/") == 1 {sub(/ .*/, "", $NF); print $NF}' .gitattributes
.storybook
assets
storybook
tests
.editorconfig
.env.testing
.eslintignore
.eslintrc
.gitignore
.stylelintrc
babel.config.js
composer.json
composer.lock
package.json
package-lock.json
phpcs.xml.dist
phpstan.neon
phpstan.neon.dist
postcss.config.js
webpack.config.js
CODE_OF_CONDUCT.md
awk
Explanation: awk
说明:
-F/
: Use /
as input field separator -F/
:使用/
作为输入字段分隔符index($0, "/wp-content/themes/") == 1
: Line start with /wp-content/themes/
only index($0, "/wp-content/themes/") == 1
: 仅以/wp-content/themes/
开头的行sub(/.*/, "", $NF)
: Remove anything after space in last field sub(/.*/, "", $NF)
:删除最后一个字段中空格后的任何内容print $NF
: Print last field print $NF
: 打印最后一个字段The quick fix would be: just pipe the output through sort -u:-)快速解决方法是:只需 pipe 和 output 通过 sort -u :-)
The root cause is your usage of the modifier 'p' in the sed
regex.根本原因是您在
sed
正则表达式中使用了修饰符“p”。 This prints out the extra copies.这将打印出额外的副本。 You can just leave it out gnu.org
你可以忽略它gnu.org
If you need the results one filename per line, I would make the script如果您需要每行一个文件名的结果,我会制作脚本
while read -r line; do
if [[ "$line" =~ (\/wp-content\/themes\/\*\*/) ]]; then
echo "$line" | sed 's/ export-ignore//' | sed 's/\/wp-content\/themes\/\*\*\///'
fi
done <"$1"/.gitattributes
or, even better, with awk或者,更好的是,使用 awk
< "$1/.gitattributes" awk '
/\/wp-content\/themes\/\*\*\// {
gsub(/\/wp-content\/themes\/\*\*\//,"");
gsub(/ export-ignore/,"");
print $0;
}'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.