使用php獲取字符串中的所有URL

Question

我正試圖找出一種從一串文本中獲取URL數組的方法。 文本將在某種程度上格式化如下：

這里有一些隨機文字

http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphones-bezel-a-massive-notification-light/?grcc=88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2=835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033fdeed202~510f37324b14c50a5e9121f955fac3fa 〜1342747216490〜0〜0〜0〜0〜0〜0〜0〜0〜7〜3〜

http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tickets-for-disrupt-sf/

顯然，這些鏈接可以是任何東西（並且可以有許多鏈接，這些只是我現在正在測試的那些。如果我使用像我的正則表達式這樣的簡單URL工作正常。

我在用：

preg_match_all('((https?|ftp|gopher|telnet|file|notes|ms-help):'.
    '((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)',
    $bodyMessage, $matches, PREG_PATTERN_ORDER);

當我做一個print_r( $matches); 我得到的結果是：

Array ( [0] => Array (
    [0] => http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon=
    [1] => http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= 
    [2] => http://techcrunch.co=
    [3] => http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-ip= 
    [4] => http://techcrunch.com/2012/07/20/last-day-to-purc=
    [5] => http://tec=
)
...

該數組中的所有項目都不是上述鏈接的完整鏈接。

任何人都知道獲得我需要的好方法嗎？ 我找到了一堆正則表達式的東西來獲取PHP的鏈接，但沒有一個工作。

謝謝！

編輯：

好的，所以我從電子郵件中提取這些鏈接。 該腳本解析電子郵件，抓取郵件正文，然后嘗試從中獲取鏈接。 調查電子郵件后，似乎是出於某種原因在網址中間添加了一個空格。 這是我的PHP腳本看到的正文消息的輸出。

 --00248c711bb99ca36d04c54ba5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= es-bezel-a-massive-notification-light/?grcc=3D88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2= =3D835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033f= deed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~ http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= ets-for-disrupt-sf/ --00248c711bb99ca36d04c54ba5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

有關如何使其不破壞URL的任何建議？

編輯2

根據Laurnet的建議，我運行了這段代碼：

 $bodyMessage = str_replace("= ", "",$bodyMessage);

然而，當我回應它時，它似乎不想替換“=”

 --00248c711bb99ca36d04c54ba5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= es-bezel-a-massive-notification-light/?grcc=3D88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2= =3D835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033f= deed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~ http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= ets-for-disrupt-sf/ --00248c711bb99ca36d04c54ba5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Answer 1

    /**
     *
     * @get URLs from string (string maybe a url)
     *
     * @param string $string

     * @return array
     *
     */
    function getUrls($string) {
        $regex = '/https?\:\/\/[^\" ]+/i';
        preg_match_all($regex, $string, $matches);
        //return (array_reverse($matches[0]));
        return ($matches[0]);
}

Answer 2

請改用以下正則表達式。

$regex = "(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))";

希望能幫助到你。

使用php獲取字符串中的所有URL

問題描述

2 個解決方案

解決方案1
9 2012-07-21 00:48:55

解決方案2
0 2013-07-04 15:42:22

使用php獲取字符串中的所有URL

問題描述

2 個解決方案

解決方案1 9 2012-07-21 00:48:55

解決方案2 0 2013-07-04 15:42:22

解決方案1
9 2012-07-21 00:48:55

解決方案2
0 2013-07-04 15:42:22