简体   繁体   English

php:将字符串拆分为关联数组的更好方法

[英]php: better way to split string into associative array

I have a string like this:我有一个这样的字符串:

"ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999"

and my goal is to split into an associative array:我的目标是拆分成一个关联数组:

Array
(
    [ALARM_ID/I4] => 1010001
    [ALARM_STATE/U4] => eventcode
    [ALARM_TEXT/A] => WMR_MAP_EXPORT
    [LOTS/A[1]] => [ STEFANO ]
    [ALARM_STATE/U1] => 1
    [WAFER/U4] => 1
    [VI_KLARF_MAP/A] => /test/klarf.map
    [KLARF_STEPID/A] => StepID
    [KLARF_DEVICEID/A] => DeviceID
    [KLARF_EQUIPMENTID/A] => EquipmentID
    [KLARF_SETUP_ID/A] => SetupID
    [RULE_ID/U4] => 1234
    [RULE_FORMULA_EXPRESSION/A] => a < b && c > d
    [RULE_FORMULA_TEXT/A] => 1 < 0 && 2 > 3
    [RULE_FORMULA_RESULT/A] => FAIL
    [TIMESTAMP/A] => 10-Nov-2020 09:10:11 99999999
)

The unique (but maybe dirties) way that I found is through this script:我发现的独特(但可能很脏)的方式是通过这个脚本:

<?php
$msg = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
$split = explode("=", $msg);
foreach($split as $k => $s) {
    $s = explode(" ", $s);
    $keys[] = array_pop($s);
    if ($s) $values[] = implode(" ", $s);
}
/*
 * this is needed if last parameter TIMESTAMP does not have ' ' (spaces) into value
 */
if (count($values) + 2 == count($keys)) array_push($values, array_pop($keys));
else                                    $values[ count($values) - 1 ] .= " " . array_pop($keys);
$params = array_combine($keys, $values);
print_r($params);
?>

Do you see a better way to split it maybe using regular expression or a different (elegant?) approach?您是否看到更好的拆分方法,可能是使用正则表达式或不同的(优雅的?)方法?

The important thing to do in maintaining accuracy is to ensure that "keys" are properly matched.保持准确性的重要事情是确保“密钥”正确匹配。

Key strings will never contain a space or an equals sign.键字符串永远不会包含空格或等号。 Value strings may contain either.值字符串可能包含任何一个。 Value strings will run to the end of the string or be followed by a space then the next key (which may not have any spaces or equal signs).值字符串将运行到字符串的末尾或后跟一个空格然后是下一个键(可能没有任何空格或等号)。

The key string can be "greedily" matched before the occurrence of the first encountered = .键字符串可以在第一次遇到=出现之前“贪婪地”匹配。

The value string must not be greedily matched.不得贪婪地匹配值字符串。 This ensures that the value is not over-extended into the next key-value pair.这确保该值不会过度扩展到下一个键值对中。

The lookahead after the value string ensures that the potential following key is not damaged/consumed.值字符串后的前瞻确保潜在的后续键不被损坏/消耗。

Pattern Breakdown:模式细分:

([^=]+)      #capture one ore more non-equals sign (greedily) and store as capture group #1
=            #match but do not capture an equals sign
(.+?)        #capture one or more of any non-newline character (giving back when possible / non-greedy) and store as capture group #2
(?=          #start lookahead
  $          #match the end of the string
  |          #OR operator
   [^ =]+=   #match space, then one or more non-space and non-equals characters, then match equals sign
)            #end lookahead

Code: ( Demo )代码:(演示

$msg = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";

preg_match_all('~([^=]+)=(.+?)(?=$| [^ =]+=)~', $msg, $out);
var_export(array_combine($out[1], $out[2]));

Output: Output:

array (
  'ALARM_ID/I4' => '1010001',
  'ALARM_STATE/U4' => 'eventcode',
  'ALARM_TEXT/A' => 'WMR_MAP_EXPORT',
  'LOTS/A[1]' => '[ STEFANO ]',
  'ALARM_STATE/U1' => '1',
  'WAFER/U4' => '1',
  'VI_KLARF_MAP/A' => '/test/klarf.map',
  'KLARF_STEPID/A' => 'StepID',
  'KLARF_DEVICEID/A' => 'DeviceID',
  'KLARF_EQUIPMENTID/A' => 'EquipmentID',
  'KLARF_SETUP_ID/A' => 'SetupID',
  'RULE_ID/U4' => '1234',
  'RULE_FORMULA_EXPRESSION/A' => 'a < b && c > d',
  'RULE_FORMULA_TEXT/A' => '1 < 0 && 2 > 3',
  'RULE_FORMULA_RESULT/A' => 'FAIL',
  'TIMESTAMP/A' => '10-Nov-2020 09:10:11 99999999',
)

You could leverage the the presence of a / in all the keys您可以利用/在所有键中的存在

([^\s=/]+/[^\s=]+)=(.*?)(?=\h+[^\s=/]+/|$)

Explanation解释

  • ( Capture group 1 (捕获组 1
    • [^\s=/]+ Match 0+ times any char except a whitespace = or / [^\s=/]+匹配 0+ 次除空格=/之外的任何字符
    • /[^\s=]+ Then match / followed by the rest of the key /[^\s=]+然后匹配/后跟key的rest
  • ) Close group 1 )关闭组 1
  • = Match literally =按字面匹配
  • (.*?) Capture group 2 , match any char except a newline as least as possible (.*?)捕获组 2 ,尽可能匹配除换行符以外的任何字符
  • (?=\h+[^\s=/]+/|$) Assert a key like format containing a / (as used in group 1) (?=\h+[^\s=/]+/|$)断言包含/的类似格式的键(在第 1 组中使用)

See a Regex demo and a Php demo .请参阅Regex 演示Php 演示

Example code示例代码

$re = '`([^\s=/]+/[^\s=]+)=(.*?)(?=\h+[^\s=/]+/|$)`';
$str = 'ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999
';

preg_match_all($re, $str, $matches);
$result = array_combine($matches[1], $matches[2]);

print_r($result);

Output Output

Array
(
    [ALARM_ID/I4] => 1010001
    [ALARM_STATE/U4] => eventcode
    [ALARM_TEXT/A] => WMR_MAP_EXPORT
    [LOTS/A[1]] => [ STEFANO ]
    [ALARM_STATE/U1] => 1
    [WAFER/U4] => 1
    [VI_KLARF_MAP/A] => /test/klarf.map
    [KLARF_STEPID/A] => StepID
    [KLARF_DEVICEID/A] => DeviceID
    [KLARF_EQUIPMENTID/A] => EquipmentID
    [KLARF_SETUP_ID/A] => SetupID
    [RULE_ID/U4] => 1234
    [RULE_FORMULA_EXPRESSION/A] => a < b && c > d
    [RULE_FORMULA_TEXT/A] => 1 < 0 && 2 > 3
    [RULE_FORMULA_RESULT/A] => FAIL
    [TIMESTAMP/A] => 10-Nov-2020 09:10:11 99999999
)

If the keys should all start with word characters separated by an underscore, you can start the pattern using a repeating part [^\W_]+(?:_[^\W_]+)*如果键都应以下划线分隔的单词字符开头,则可以使用重复部分[^\W_]+(?:_[^\W_]+)*开始模式

It will match word chars except an _ , and then repeat matching _ followed by word chars except _ until it reaches a /它将匹配除_之外的字符字符,然后重复匹配_后跟除_之外的字符字符,直到它到达/

([^\W_]+(?:_[^\W_]+)*/[^\s=]*)=(.*?)(?=\h+[^\s=/]+/|$)

Regex demo正则表达式演示

I managed this code, using basic PHP functions.我使用基本的 PHP 函数管理了这段代码。 I think that a regular expression makes the code more difficult to read.我认为正则表达式使代码更难阅读。 Most of the time, even at the expense of having more verbose code, you are better off not using regular expressions.大多数时候,即使以拥有更冗长的代码为代价,最好不要使用正则表达式。 There might also be a performance impact.也可能会对性能产生影响。

$message = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";

foreach (explode(' ', $message) as $word) {
    if (strpos($word, '=')) {
        if (isset($key)) $parameters[$key] = $value; 
        list($key, $value) = explode('=', $word);
    }
    else $value .= " $word";
}    
$parameters[$key] = $value;     

echo '<pre>';
print_r($parameters);
echo '</pre>';

I chose to split on the spaces, then I look for the = characters to find the words with the keys in them.我选择按空格拆分,然后寻找=字符以找到其中包含键的单词。

There are, of course, other ways of doing the same, but all will involve a bit of extra work because of the strange format of the message.当然,还有其他方法可以做到这一点,但由于消息的格式很奇怪,所有方法都会涉及一些额外的工作。

This routine currently does not tolerate errors in the message string, but it can easily be expanded to tolerate various types of input errors.此例程目前不能容忍消息字符串中的错误,但可以轻松扩展以容忍各种类型的输入错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM