[英]Bash: Find and replace lines in a file using the lines of another file
I have two files: masterlist.txt
that has hundreds of lines of URLs, and toupdate.txt
that has a smaller number of updated versions of lines from the masterlist.txt
file that need to be replaced.我有两个文件:具有数百行 URL 的
masterlist.txt
和toupdate.txt
,其中包含需要替换的masterlist.txt
文件中较少数量的更新版本的行。
I'd like to be able to automate this process using Bash, since the creation and utilisation of these lists is already occuring in a bash script.我希望能够使用 Bash 自动执行此过程,因为这些列表的创建和使用已经在 bash 脚本中进行。
The server part of the URL is the part that changes, so we could match using the unique part: /whatever/whatever_user.xml
, but how to find and replace those lines in masterlist.txt
? URL 的服务器部分是更改的部分,因此我们可以使用唯一部分进行匹配:
/whatever/whatever_user.xml
,但是如何在masterlist.txt
中查找和替换这些行? ie how to go through each line of toupdate.txt
and as it ends in /f_SomeName/f_SomeName_user.xml
, find that ending in masterlist.txt
and replace that whole line with the new one?即如何通过 toupdate.txt 的每一行
toupdate.txt
并以/f_SomeName/f_SomeName_user.xml
结尾,找到以masterlist.txt
结尾并用新行替换整行?
So https://123456url.domain.com/26/path/f_SomeName/f_SomeName_user.xml
becomes https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml
for example. So
https://123456url.domain.com/26/path/f_SomeName/f_SomeName_user.xml
becomes https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml
for example.
The rest of masterlist.txt
needs to stay intact, so we must only find and replace lines that have different servers for the same line endings (IDs). masterlist.txt 的
masterlist.txt
需要保持不变,因此我们必须仅查找和替换具有相同行尾 (ID) 的不同服务器的行。
masterlist.txt
looks like this: masterlist.txt
看起来像这样:
https://123456url.domain.com/26/path/f_SomeName/f_SomeName_user.xml
https://456789url.domain.com/32/path/f_AnotherName/f_AnotherName_user.xml
https://101112url.domain.com/1/path/g_SomethingElse/g_SomethingElse_user.xml
https://222blah11.domain.com/19/path/e_BlahBlah/e_BlahBlah_user.xml
[...]
toupdate.txt
looks like this: toupdate.txt
看起来像这样:
https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml
https://foo-254.domain.com/8/path/g_SomethingElse/g_SomethingElse_user.xml
Make masterlist.txt
look like:使
masterlist.txt
看起来像:
https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml
https://456789url.domain.com/32/path/f_AnotherName/f_AnotherName_user.xml
https://foo-254.domain.com/8/path/g_SomethingElse/g_SomethingElse_user.xml
https://222blah11.domain.com/19/path/e_BlahBlah/e_BlahBlah_user.xml
[...]
I've looked at sed
but I don't know how to do the find and replace using lines from the two files?我看过
sed
但我不知道如何使用这两个文件中的行进行查找和替换?
Here's what I have so far, doing the file handling at least:这是我到目前为止所拥有的,至少进行文件处理:
#!/bin/bash
#...
while read -r line; do
# there's a new link on each line
link="${line}"
# extract the unique part from the end of each line
grabXML="${link##*/}"
grabID="${grabXML%_user.xml}"
# if we cannot grab the ID, then just set it to use the full link so we don't have an empty string
if [ -n "${grabID}" ]; then
identifier=${grabID}
else
identifier="${line}"
fi
## the find and replace here? ##
# we're done when we've reached the end of the file
done < "masterlist.txt"
Would you please try the following:请您尝试以下方法:
#!/bin/bash
declare -A map
while IFS= read -r line; do
if [[ $line =~ (/[^/]+/[^/]*\.xml)$ ]]; then
uniq_part="${BASH_REMATCH[1]}"
map[$uniq_part]=$line
fi
done < "toupdate.txt"
while IFS= read -r line; do
if [[ $line =~ (/[^/]+/[^/]*\.xml)$ ]]; then
uniq_part="${BASH_REMATCH[1]}"
if [[ -n ${map[$uniq_part]} ]]; then
line=${map[$uniq_part]}
fi
fi
echo "$line"
done < "masterlist.txt" > "masterlist_tmp.txt"
# if the result of "masterlist_tmp.txt" is good enough, uncomment the line below
# mv -f -- "masterlist_tmp.txt" "masterlist.txt"
result:结果:
https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml
https://456789url.domain.com/32/path/f_AnotherName/f_AnotherName_user.xml
https://foo-254.domain.com/8/path/g_SomethingElse/g_SomethingElse_user.xml
https://222blah11.domain.com/19/path/e_BlahBlah/e_BlahBlah_user.xml
[Explanations] [说明]
map
maps the "unique part" such as /f_SomeName/f_SomeName_user.xml
to the "full path" such as https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml
. map
maps the "unique part" such as /f_SomeName/f_SomeName_user.xml
to the "full path" such as https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml
.(/[^/]+/[^/]*\.xml)$
, if matched, assigns the shell variable BASH_REMATCH[1]
to the substring from the second rightmost slash to the extention ".xml" at the end of the string.(/[^/]+/[^/]*\.xml)$
,如果匹配,则将 shell 变量BASH_REMATCH[1]
分配给 substring 从第二个最右边的斜杠到末尾的扩展“.xml”的字符串。 [Alternative] [选择]
If the text files are large in size, bash
may not be fast enough.如果文本文件很大,
bash
可能不够快。 In such a case, awk
script will work more efficiently:在这种情况下,
awk
脚本将更有效地工作:
awk 'NR==FNR {
if (match($0, "/[^/]+/[^/]*\\.xml$")) {
map[substr($0, RSTART, RLENGTH)] = $0
}
next
}
{
if (match($0, "/[^/]+/[^/]*\\.xml$")) {
full_path = map[substr($0, RSTART, RLENGTH)]
if (full_path != "") {
$0 = full_path
}
}
print
}' "toupdate.txt" "masterlist.txt" > "masterlist_tmp.txt"
[Explanations] [说明]
NR==FNR { BLOCK1; next } { BLOCK2 }
NR==FNR { BLOCK1; next } { BLOCK2 }
NR==FNR { BLOCK1; next } { BLOCK2 }
syntax is a common idiom to switch the processing individually for each file. NR==FNR { BLOCK1; next } { BLOCK2 }
语法是为每个文件单独切换处理的常用习惯用法。 As the NR==FNR
condition meets only for the 1st file in the argument list and next
statement skips the following block, BLOCK1
processes the file "toupdate.txt" only.NR==FNR
条件仅适用于参数列表中的第一个文件,并且next
一条语句跳过下一个块,因此 BLOCK1 仅处理文件“ BLOCK1
”。 Similarly BLOCK2
processes the file "masterlist.txt" only.BLOCK2
仅处理文件“masterlist.txt”。match($0, pattern)
succeeds, it sets the awk
variable RSTART
to the start position of the matched substring out of $0
, the current record read from the file, then sets the variable RLENGTH
to the length of the matched substring. match($0, pattern)
succeeds, it sets the awk
variable RSTART
to the start position of the matched substring out of $0
, the current record read from the file, then sets the variable RLENGTH
to the length of the matched substring. Now we can extract the matched substring such as /f_SomeName/f_SomeName_user.xml
by using the substr()
function.substr()
function 提取匹配的 substring 例如/f_SomeName/f_SomeName_user.xml
。map
so that the substring (the unique part) is mapped to the whole url in "toupdate.txt".map
以便 substring(唯一部分)映射到“toupdate.txt”中的整个 url。map
, then the record ($0) is replaced with the value of the array indexed by the key.map
中找到与键对应的值,则将记录 ($0) 替换为键索引的数组的值。Why not have sed
write its own script - producing the desired output,为什么不让
sed
编写自己的脚本 - 生成所需的 output,
sed -e "$(sed -e 's<^\(http[s]*://[^/]*/[^/]*/\)\(.*\)<\\|\2\$| s|.*|\1\2|<' toupdate.txt)" masterlist.txt
where在哪里
sed
command has an outer and an inner s
ubstitution commandsed
命令有一个外部和s
内部替换命令s
( s<...<...<
) captures scheme://domain/N/ as \1
and rest-of-path \(.*\)
as \2
and inserts them into a script for the outer sed
command s
( s<...<...<
) 将 scheme://domain/N/ 捕获为\1
并将 rest-of-path \(.*\)
) 捕获为\2
并将它们插入到外部脚本中sed
命令sed
script ( \|\2$| s|.*|\1\2|
) finds URLs in masterlist.txt
ending in rest-of-path, substituting (inner s
) the new URL from toupdate.txt
sed
脚本( \|\2$| s|.*|\1\2|
)在masterlist.txt
中查找以 rest-of-path 结尾的 URL,替换(内部s
)来自 toupdate.txt 的新toupdate.txt
<
and |
<
和|
are used as delimiters for the two s
commands, and \|...|
s
命令的分隔符, \|...|
is used for /.../
/.../
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.