简体   繁体   English

PHP,用于解析数据的正则表达式

[英]PHP, Regular expression to parse data

I have data in the format: 我有以下格式的数据:

Football - 101 Carolina Panthers +15 -110 for Game 足球-101卡罗莱纳黑豹+15 -110

Football - 101 Carolina Panthers/Pittsburgh Steelers under 36½ -110 for Game 足球 - 101卡罗莱纳黑豹队/匹兹堡钢人队以36½110的比分获胜

Football - 102 Pittsburgh Steelers -9 -120 for 1st Half 足球-匹兹堡钢人队102 -1上半场-9 -120


How to transform this into a PHP array: 如何将其转换为PHP数组:

$game_data[] = array( 'sport_type'  => 'Football',
                      'game_number' => 101,
                      'game_name'   => 'Carolina Panthers',
                      'runline_odd' => '+15 -110',
                      'total_odd'   => '',
                      'odd_type'    => 'runline',
                      'period'      => 'Game' );

$game_data[] = array( 'sport_type'  => 'Football',
                      'game_number' => 101,
                      'game_name'   => 'Carolina Panthers/Pittsburgh Steelers',
                      'runline_odd' => '',
                      'total_odd'   => 'under 36½ -110',
                      'odd_type'    => 'total_odd',
                      'period'      => 'Game' );

$game_data[] = array( 'sport_type'  => 'Football',
                      'game_number' => 102,
                      'game_name'   => 'Pittsburgh Steelers',
                      'runline_odd' => '-9 -120',
                      'total_odd'   => '',
                      'odd_type'    => 'runline',
                      'period'      => '1st Half' );

Following works except the case where there is an under after gmae name: 以下工作除了gmae名称之外的情况:

/([^-]+)\s*-\s*(\d+)\s*([^\d+-]+)\s*((?:under\s*)?[\d\s+-]+)\s*for\s*(.+)/

Explanation: 说明:

([^-]+): Match anything other than -, which is separating gmae name from other details.
\s*-\s*: - surrounded with spaces
(\d+)  : Game number
([^\d+-]+): Anything other than +, -, a digit. Matches gmae name.
((?:under\s*)?[\d\s+-]+): runline odd or total odd.

PS: PS:

  1. Take care of the cases where there is 'under'. 请注意存在“不足”的情况。 The regex above is dumping it with game_name. 上面的正则表达式将其与game_name一起转储。
  2. Take care of unicode chars. 照顾unicode字符。

Normally I wouldn't solve the whole problem for someone, but the ½ character made it interesting enough. 通常我不会为某人解决整个问题,但是½字符使它变得很有趣。 Now, I'm not a super expert on regexes so this might not be the most optimized or elegant solution, but it seems to get the job done. 现在,我不是正则表达式的超级专家,因此这可能不是最优化或最优雅的解决方案,但似乎可以完成工作。 At least with the provided sample input. 至少使用提供的样本输入。

EDIT: Oops. 编辑:哎呀。 Didn't catch that under was actually part of the runline_odd data. 没赶上under实际上是一部分runline_odd数据。 So this does actually not currently get the job done. 因此,这实际上目前尚无法完成工作。 I'll be back. 我会回来的。

EDIT2: Revised the regex slightly and it now correctly matches between runline_odd and runline_total . EDIT2:稍微修改了正则表达式,现在它在runline_oddrunline_total之间正确匹配。

<?php
$input = array(
'Football - 101 Carolina Panthers +15 -110 for Game',
'Football - 101 Carolina Panthers/Pittsburgh Steelers under 36½ -110 for Game',
'Football - 102 Pittsburgh Steelers -9 -120 for 1st Half'
);

$regex = '^(?<sport_type>[[:alpha:]]*) - '.
         '(?<game_number>[0-9]*) '.
         '('.
            '(?<game_nameb>[[:alpha:]\/ ]*?) '.
            '(?<runline_total>(under ([0-9\x{00BD}]+){1}) ((-|\+)?([-+0-9\x{00BD}]+){1})) for '.
         '|'.
            '(?<game_namea>[[:alpha:]\/ ]*) '.
            '(?<runline_odd>((-|\+)?([0-9\x{00BD}]+){1}) ((-|\+)?([-+0-9\x{00BD}]+){1})) for '.
         ')'.
         '(?<period>.*)$';


$game_data = array();

foreach ($input as $in) {
    $matches = false;
    $cnt = preg_match('/' . $regex . '/ui', $in, $matches);

    if ($cnt && is_array($matches) && count($matches)) {
        if (empty($matches['game_nameb'])) {
            $game_name = $matches['game_namea'];
            $runline_odd = $matches['runline_odd'];
            $total_odd = '';
        } else {
            $game_name = $matches['game_nameb'];
            $runline_odd = '';
            $total_odd = $matches['runline_total'];
        }


        $result = array(
            'sport_type' => $matches['sport_type'],
            'game_number' => $matches['game_number'],
            'game_name' => $game_name,
            'runline_odd' => $runline_odd,
            'total_odd' => $total_odd,
            'period' => $matches['period']
        );

        array_push($game_data, $result);
    }
}

var_dump($game_data);

This produces the following: 这产生以下结果:

$ /usr/local/bin/php preg-match.php 
array(3) {
[0]=>
  array(6) {
    ["sport_type"]=>
    string(8) "Football"
    ["game_number"]=>
    string(3) "101"
    ["game_name"]=>
    string(17) "Carolina Panthers"
    ["runline_odd"]=>
    string(8) "+15 -110"
    ["total_odd"]=>
    string(0) ""
    ["period"]=>
    string(4) "Game"
  }
  [1]=>
  array(6) {
    ["sport_type"]=>
    string(8) "Football"
    ["game_number"]=>
    string(3) "101"
    ["game_name"]=>
    string(37) "Carolina Panthers/Pittsburgh Steelers"
    ["runline_odd"]=>
    string(0) ""
    ["total_odd"]=>
    string(15) "under 36½ -110"
    ["period"]=>
    string(4) "Game"
  }
  [2]=>
  array(6) {
    ["sport_type"]=>
    string(8) "Football"
    ["game_number"]=>
    string(3) "102"
    ["game_name"]=>
    string(19) "Pittsburgh Steelers"
    ["runline_odd"]=>
    string(7) "-9 -120"
    ["total_odd"]=>
    string(0) ""
    ["period"]=>
    string(8) "1st Half"
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM