Bash腳本循環遍歷MySQL行並使用curl和grep

Question

我有一個帶有表的mysql數據庫：url | 話

和數據，例如：

------Column URL-------   -------Column Words------

www.firstwebsite.com    |   hello, hi

www.secondwebsite.com   |   someword, someotherword

我想遍歷該表以檢查該單詞是否出現在由url指定的網站的內容中。

我有這樣的事情：

!/bin/bash

mysql --user=USERNAME --password=PASSWORD DATABASE --skip-column-names -e "SELECT url, keyword FROM things" | while read url keyword; do
    content=$(curl -sL $url)
    echo $content | egrep -q $keyword
    status=$?

    if test $status -eq 0 ; then
        # Found...
    else
        # Not found...
    fi
done

問題之一：

這非常慢：如何設置curl來優化每個網站的加載時間，不加載圖片，諸如此類？

另外，將這樣的東西放在shell腳本中還是個好主意，還是創建一個php腳本並用curl調用更好？

謝謝！

Answer 1

就目前而言，當示例中每行有多個關鍵字時，腳本將無法正常運行。 原因是，當您向egrep傳遞hello, hi它將在其輸入中查找確切的字符串“ hello，hi”，而不是“ hello”或“ hi”。 您可以通過將每個關鍵字列表轉換為帶有sed的egrep兼容正則表達式來解決此問題，而無需更改數據庫中的內容。 您還需要刪除| 從mysql的輸出，例如，與awk 。

在下載網頁的HTML時curl不會檢索圖像。 如果查詢URL的順序對您而言無關緊要，則可以通過使整個過程與&異步來加快處理速度。

#!/bin/bash

handle_url() {
    if curl -sL "$1" | egrep -q "$2"; then
        echo 1 # Found...
    else
        echo 0 # Not found...
    fi
}

mysql --user=USERNAME --password=PASSWORD DATABASE --skip-column-names -e "SELECT url, keyword FROM things" | awk -F \| '{ print $1, $2 }' | while read url keywords; do
    keywords=$(echo $keywords | sed -e 's/, /|/g;s/^/(/;s/$/)/;')
    handle_url "$url" "$keywords" &
done

Bash腳本循環遍歷MySQL行並使用curl和grep

問題描述

1 個解決方案

解決方案1
0 2014-03-06 22:26:47

Bash腳本循環遍歷MySQL行並使用curl和grep

問題描述

1 個解決方案

解決方案1 0 2014-03-06 22:26:47

解決方案1
0 2014-03-06 22:26:47