PHP preg_match_all正則表達式僅提取字符串中的數字

Question

我似乎無法找出從字符串中僅提取特定數字的正確正則表達式。 我有一個包含各種img標簽的HTML字符串。 HTML中有很多img標記，我想從中提取一部分值。 它們遵循以下格式：

<img src="http://domain.com/images/59.jpg" class="something" />
<img src="http://domain.com/images/549.jpg" class="something" />
<img src="http://domain.com/images/1249.jpg" class="something" />
<img src="http://domain.com/images/6.jpg" class="something" />

因此，通常“ .jpg”之前的數字長度會有所不同（它可能是.gif，.png或其他名稱）。 我只想從該字符串中提取數字。

第二部分是我想使用該數字在數據庫中查找條目，並獲取該特定ID的alt / title標簽。 最后，我想將返回的數據庫值添加到字符串中，並將其扔回到HTML字符串中。

任何有關如何進行的想法都很棒。

到目前為止，我已經嘗試過：

$pattern = '/img src="http://domain.com/images/[0-9]+\/.jpg';
preg_match_all($pattern, $body, $matches);
var_dump($matches);

Answer 1

我認為這是最好的方法：

使用HTML解析器提取圖像標簽
使用正則表達式（或字符串操作）提取ID
查詢數據
使用HTML解析器插入返回的數據

這是一個例子。 我可以想到一些改進，例如使用字符串操作代替正則表達式。

$html = '<img src="http://domain.com/images/59.jpg" class="something" />
<img src="http://domain.com/images/549.jpg" class="something" />
<img src="http://domain.com/images/1249.jpg" class="something" />
<img src="http://domain.com/images/6.jpg" class="something" />';
$doc = new DOMDocument;
$doc->loadHtml( $html);

foreach( $doc->getElementsByTagName('img') as $img)
{
    $src = $img->getAttribute('src');
    preg_match( '#/images/([0-9]+)\.#i', $src, $matches);
    $id = $matches[1];
    echo 'Fetching info for image ID ' . $id . "\n";

    // Query stuff here
    $result = 'Got this from the DB';

    $img->setAttribute( 'title', $result);
    $img->setAttribute( 'alt', $result);
}

$newHTML = $doc->saveHtml();

Answer 2

使用正則表達式，您可以非常輕松地獲取數字。 preg_match_all的第三個參數是一個按引用數組，該數組將使用找到的匹配項進行填充。

preg_match_all('/<img src="http:\/\/domain.com\/images\/(\d+)\.[a-zA-Z]+"/', $html, $matches);
print_r($matches);

這將包含找到的所有內容。

Answer 3

使用preg_match_all ：

preg_match_all('#<img.*?/(\d+)\.#', $str, $m);
print_r($m);

輸出：

Array
(
    [0] => Array
        (
            [0] => <img src="http://domain.com/images/59.
            [1] => <img src="http://domain.com/images/549.
            [2] => <img src="http://domain.com/images/1249.
            [3] => <img src="http://domain.com/images/6.
        )

    [1] => Array
        (
            [0] => 59
            [1] => 549
            [2] => 1249
            [3] => 6
        )

)

Answer 4

考慮使用preg_replace_callback 。

使用此正則表達式： (images/([0-9]+)[^"]+")

然后，使用匿名函數作為callback參數。 結果：

$output = preg_replace_callback(
    "(images/([0-9]+)[^\"]+\")",
    function($m) {
        // $m[1] is the number.
        $t = getTitleFromDatabase($m[1]); // do whatever you have to do to get the title
        return $m[0]." title=\"".$t."\"";
    },
    $input
);

Answer 5

此正則表達式應與數字部分匹配：

\/images\/(?P<digits>[0-9]+)\.[a-z]+

您的$matches['digits']應該具有您想要的所有數字作為數組。

Answer 6

$matches = array();
preg_match_all('/[:digits:]+/', $htmlString, $matches);

然后遍歷matches數組以重新構造HTML並在數據庫中查找。

Answer 7

在解析糟糕的HTML時，正則表達式本身就顯得有些松懈。 DOMDocument的HTML處理非常好，可以立即提供新鮮的tagoup，可以使用xpath選擇圖像src，還可以使用簡單的sscanf提取數字：

$ids = array();
$doc = new DOMDocument();
$doc->loadHTML($html);
foreach(simplexml_import_dom($doc)->xpath('//img/@src[contains(., "/images/")]') as $src) {
    if (sscanf($src, '%*[^0-9]%d', $number)) {
        $ids[] = $number;
    }
}

因為那只會給您一個數組，為什么不封裝它呢？

$html = '<img src="http://domain.com/images/59.jpg" class="something" />
<img src="http://domain.com/images/549.jpg" class="something" />
<img src="http://domain.com/images/1249.jpg" class="something" />
<img src="http://domain.com/images/6.jpg" class="something" />';

$imageNumbers = new ImageNumbers($html);

var_dump((array) $imageNumbers);

這給你：

array(4) {
  [0]=>
  int(59)
  [1]=>
  int(549)
  [2]=>
  int(1249)
  [3]=>
  int(6)
}

通過上面的那個函數可以很好地包裝到ArrayObject ：

class ImageNumbers extends ArrayObject
{
    public function __construct($html) {
        parent::__construct($this->extractFromHTML($html));
    }
    private function extractFromHTML($html) {
        $numbers = array();
        $doc = new DOMDocument();
        $preserve = libxml_use_internal_errors(TRUE);
        $doc->loadHTML($html);
        foreach(simplexml_import_dom($doc)->xpath('//img/@src[contains(., "/images/")]') as $src) {
            if (sscanf($src, '%*[^0-9]%d', $number)) {
                $numbers[] = $number;
            }
        }
        libxml_use_internal_errors($preserve);
        return $numbers;
    }
}

如果您的HTML格式錯誤，甚至DOMDocument::loadHTML()也無法處理，那么您只需在ImageNumbers類中內部處理ImageNumbers 。

PHP preg_match_all正則表達式僅提取字符串中的數字

問題描述

7 個解決方案

解決方案1
2 已采納 2012-03-14 16:10:47

解決方案2
1 2012-03-14 16:01:44

解決方案3
1 2012-03-14 16:02:20

解決方案4
1 2012-03-14 16:06:38

解決方案5
0 2012-03-14 16:02:37

解決方案6
0 2012-03-14 16:04:28

解決方案7
0 2012-03-14 16:53:36

PHP preg_match_all正則表達式僅提取字符串中的數字

問題描述

7 個解決方案

解決方案1 2 已采納 2012-03-14 16:10:47

解決方案2 1 2012-03-14 16:01:44

解決方案3 1 2012-03-14 16:02:20

解決方案4 1 2012-03-14 16:06:38

解決方案5 0 2012-03-14 16:02:37

解決方案6 0 2012-03-14 16:04:28

解決方案7 0 2012-03-14 16:53:36

解決方案1
2 已采納 2012-03-14 16:10:47

解決方案2
1 2012-03-14 16:01:44

解決方案3
1 2012-03-14 16:02:20

解決方案4
1 2012-03-14 16:06:38

解決方案5
0 2012-03-14 16:02:37

解決方案6
0 2012-03-14 16:04:28

解決方案7
0 2012-03-14 16:53:36