Bash：使用另一個文件的行查找和替換文件中的行

Question

我有兩個文件：具有數百行 URL 的masterlist.txt和toupdate.txt ，其中包含需要替換的masterlist.txt文件中較少數量的更新版本的行。

我希望能夠使用 Bash 自動執行此過程，因為這些列表的創建和使用已經在 bash 腳本中進行。

URL 的服務器部分是更改的部分，因此我們可以使用唯一部分進行匹配： /whatever/whatever_user.xml ，但是如何在masterlist.txt中查找和替換這些行？ 即如何通過 toupdate.txt 的每一行toupdate.txt並以/f_SomeName/f_SomeName_user.xml結尾，找到以masterlist.txt結尾並用新行替換整行？

So https://123456url.domain.com/26/path/f_SomeName/f_SomeName_user.xml becomes https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml for example.

masterlist.txt 的masterlist.txt需要保持不變，因此我們必須僅查找和替換具有相同行尾 (ID) 的不同服務器的行。

結構

masterlist.txt看起來像這樣：

https://123456url.domain.com/26/path/f_SomeName/f_SomeName_user.xml
https://456789url.domain.com/32/path/f_AnotherName/f_AnotherName_user.xml
https://101112url.domain.com/1/path/g_SomethingElse/g_SomethingElse_user.xml
https://222blah11.domain.com/19/path/e_BlahBlah/e_BlahBlah_user.xml
[...]

toupdate.txt看起來像這樣：

https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml
https://foo-254.domain.com/8/path/g_SomethingElse/g_SomethingElse_user.xml

期望的結果

使masterlist.txt看起來像：

https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml
https://456789url.domain.com/32/path/f_AnotherName/f_AnotherName_user.xml
https://foo-254.domain.com/8/path/g_SomethingElse/g_SomethingElse_user.xml
https://222blah11.domain.com/19/path/e_BlahBlah/e_BlahBlah_user.xml
[...]

初步檢查

我看過sed但我不知道如何使用這兩個文件中的行進行查找和替換？

這是我到目前為止所擁有的，至少進行文件處理：

#!/bin/bash

#...

while read -r line; do
    # there's a new link on each line
    link="${line}"
    # extract the unique part from the end of each line
    grabXML="${link##*/}"
    grabID="${grabXML%_user.xml}"
    # if we cannot grab the ID, then just set it to use the full link so we don't have an empty string
    if [ -n "${grabID}" ]; then
        identifier=${grabID}
    else
        identifier="${line}"
    fi
    
    ## the find and replace here? ##    

# we're done when we've reached the end of the file
done < "masterlist.txt"

Answer 1

請您嘗試以下方法：

#!/bin/bash

declare -A map
while IFS= read -r line; do
    if [[ $line =~ (/[^/]+/[^/]*\.xml)$ ]]; then
        uniq_part="${BASH_REMATCH[1]}"
        map[$uniq_part]=$line
    fi
done < "toupdate.txt"

while IFS= read -r line; do
    if [[ $line =~ (/[^/]+/[^/]*\.xml)$ ]]; then
        uniq_part="${BASH_REMATCH[1]}"
        if [[ -n ${map[$uniq_part]} ]]; then
            line=${map[$uniq_part]}
        fi
    fi
    echo "$line"
done < "masterlist.txt" > "masterlist_tmp.txt"

# if the result of "masterlist_tmp.txt" is good enough, uncomment the line below
# mv -f -- "masterlist_tmp.txt" "masterlist.txt"

結果：

https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml
https://456789url.domain.com/32/path/f_AnotherName/f_AnotherName_user.xml
https://foo-254.domain.com/8/path/g_SomethingElse/g_SomethingElse_user.xml
https://222blah11.domain.com/19/path/e_BlahBlah/e_BlahBlah_user.xml

[說明]

The associative array map maps the "unique part" such as /f_SomeName/f_SomeName_user.xml to the "full path" such as https://new-123.domain.com/1/path/f_SomeName/f_SomeName_user.xml .
正則表達式(/[^/]+/[^/]*\.xml)$ ，如果匹配，則將 shell 變量BASH_REMATCH[1]分配給 substring 從第二個最右邊的斜杠到末尾的擴展“.xml”的字符串。
在文件“toupdate.txt”的第一個循環中，它生成“唯一部分”和“填充路徑”對作為關聯數組的鍵值對。
在文件“masterlist.txt”的第二個循環中，測試提取的“唯一部分”是否存在關聯值。 如果是這樣，則該行將替換為關聯的值，即“toupdate.txt”文件中的行。

[選擇]
如果文本文件很大， bash可能不夠快。 在這種情況下， awk腳本將更有效地工作：

awk 'NR==FNR {
    if (match($0, "/[^/]+/[^/]*\\.xml$")) {
        map[substr($0, RSTART, RLENGTH)] = $0
    }
    next
}
{
    if (match($0, "/[^/]+/[^/]*\\.xml$")) {
        full_path = map[substr($0, RSTART, RLENGTH)]
        if (full_path != "") {
            $0 = full_path
        }
    }
    print
}' "toupdate.txt" "masterlist.txt" > "masterlist_tmp.txt"

[說明]

NR==FNR { BLOCK1; next } { BLOCK2 } NR==FNR { BLOCK1; next } { BLOCK2 }語法是為每個文件單獨切換處理的常用習慣用法。 由於NR==FNR條件僅適用於參數列表中的第一個文件，並且next一條語句跳過下一個塊，因此 BLOCK1 僅處理文件“ BLOCK1 ”。 同樣， BLOCK2僅處理文件“masterlist.txt”。
If the function match($0, pattern) succeeds, it sets the awk variable RSTART to the start position of the matched substring out of $0 , the current record read from the file, then sets the variable RLENGTH to the length of the matched substring. 現在我們可以使用substr() function 提取匹配的 substring 例如/f_SomeName/f_SomeName_user.xml 。
然后我們分配數組map以便 substring（唯一部分）映射到“toupdate.txt”中的整個 url。
第二個塊的工作方式與第一個塊大部分相似。 如果在數組map中找到與鍵對應的值，則將記錄 ($0) 替換為鍵索引的數組的值。

Answer 2

為什么不讓sed編寫自己的腳本 - 生成所需的 output，

sed -e "$(sed -e 's<^\(http[s]*://[^/]*/[^/]*/\)\(.*\)<\\|\2\$| s|.*|\1\2|<' toupdate.txt)" masterlist.txt

在哪里

內部sed命令有一個外部和s內部替換命令
external s ( s<...<...< ) 將 scheme://domain/N/ 捕獲為\1並將 rest-of-path $.*$ ) 捕獲為\2並將它們插入到外部腳本中sed命令
外部sed腳本（ \|\2$| s|.*|\1\2| ）在masterlist.txt中查找以 rest-of-path 結尾的 URL，替換（內部s ）來自 toupdate.txt 的新toupdate.txt
避免大量反斜杠轉義<和| 用作兩個s命令的分隔符， \|...| 用於/.../

Bash：使用另一個文件的行查找和替換文件中的行

問題描述

結構

期望的結果

初步檢查

2 個解決方案

解決方案1
2 已采納 2021-01-24 02:45:49

解決方案2
2 2021-01-24 10:47:01

Bash：使用另一個文件的行查找和替換文件中的行

問題描述

結構

期望的結果

初步檢查

2 個解決方案

解決方案1 2 已采納 2021-01-24 02:45:49

解決方案2 2 2021-01-24 10:47:01

解決方案1
2 已采納 2021-01-24 02:45:49

解決方案2
2 2021-01-24 10:47:01