简体   繁体   English

php regex从字符串中提取多个匹配项

[英]php regex to extract multiple matches from string

I'm trying to make a php regex to extract multiple sections/conditions from one string... let me show you what I'm talking about; 我正在尝试制作一个php正则表达式从一个字符串中提取多个部分/条件...让我告诉你我在说什么; this is an excerpt from the total file contents (the real contents contain hundreds of these groupings): 这是总文件内容的摘录(真实内容包含数百个这样的分组):

part "C28"
{ type       : "1AB010050093",
  %cadtype   : "1AB010050094",
  shapeid    : "2_1206",
  descr      : "4700.0000 pFarad 10.00 % 100.0 - VE5-VS3",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "508",
  %_Term_Seq : "" }
part "C29"
{ type       : "1AB008140029",
  shapeid    : "2_1206",
  descr      : "150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "3",
  %_Term_Seq : "" }

As you can see, the data in the excerpt repeats twice. 如您所见,摘录中的数据重复两次。 I need to search through the whole file and extract the following: 我需要搜索整个文件并提取以下内容:

  • string after the word "part" -- which would be "C28" or "C29" 单词“part”后面的字符串 - 这将是“C28”或“C29”
  • string after the "type" property -- which would be "1AB010050093" or "1AB008140029" “type”属性后面的字符串 - 这将是“1AB010050093”或“1AB008140029”

So, essentially, I need to get all the part references and associated types out of this file...and I'm not sure the best way to go about doing this. 所以,基本上,我需要从这个文件中获取所有部件引用和相关类型......而且我不确定这样做的最佳方法。

Please let me know if more info is needed to help... thanks in advance! 如果需要更多信息,请告知我们...提前感谢!

Description 描述

This expression will: 这个表达式将:

  • capture the group name as ref 捕获组名称为ref
  • capture the values of the type and descr fields. 捕获typedescr字段的值。
  • The Type field when captured should be put into a named group called partnumber 捕获时的Type字段应放入名为partnumber的命名组中
  • The fields can appear in any order in the body 字段可以按正文中的任何顺序出现
  • the descr field is optional and should only be captured if it exists. descr字段是可选的,只有在它存在时才能被捕获。 The (?: ... )?`` brackets around the descr` field makes the field optional descr`字段)?`` brackets around the (?: ... )?`` brackets around the使该字段可选

Note this is a single expression so you'll in to use the x option to so the regex engine ignore white space. 请注意,这是一个单独的表达式,因此您将使用x选项,以便正则表达式引擎忽略空格。

^part\s"(?P<ref>[^"]*)"[^{]*{
(?:(?=[^}]*\sdescr\s*:\s+"(?P<descr>[^"]*)"))?
(?=[^}]*\stype\s*:\s+"(?P<type>[^"]*)")

在此输入图像描述

PHP Code Example: PHP代码示例:

Input Text 输入文本

part "C28"
{ type       : "1AB010050093",
  %cadtype   : "1AB010050094",
  shapeid    : "2_1206",
  descr      : "4700.0000 pFarad 10.00 % 100.0 - VE5-VS3",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "508",
  %_Term_Seq : "" }
part "C29"
{ type       : "1AB008140029",
  shapeid    : "2_1206",
  descr      : "150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "3",
  %_Term_Seq : "" }
part "C30"
{ type       : "1AB0081400 30",
  shapeid    : "2_1206 30",
  insclass   : "CP6A,CP6B 30",
  gentype    : "RECT_032_016_006 30",
  machine    : "SMT 30",
  %package   : "080450E 30 ",
  %_item_number: "3 30 ",
  %_Term_Seq : "30" }

Code

<?php
$sourcestring="your source string";
preg_match_all('/^part\s"(?P<ref>[^"]*)"[^{]*{
(?:(?=[^}]*\sdescr\s*:\s+"(?P<descr>[^"]*)"))?
(?=[^}]*\stype\s*:\s+"(?P<partnumber>[^"]*)")/imsx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

Matches 火柴

$matches Array:
(
[ref] => Array
    (
        [0] => C28
        [1] => C29
        [2] => C30
    )

 [descr] => Array
    (
        [0] => 4700.0000 pFarad 10.00 % 100.0 - VE5-VS3
        [1] => 150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR
        [2] => 
    )

[partnumber] => Array
    (
        [0] => 1AB010050093
        [1] => 1AB008140029
        [2] => 1AB0081400 30
    )

)

Assuming each groups have the same structure, you can use this pattern: 假设每个组具有相同的结构,您可以使用此模式:

preg_match_all('~([^"]++)"[^{"]++[^"]++"([^"]++)~', $subject, $matches);
print_r($matches);

EDIT: 编辑:

Notice: if you have more informations to extract, you can easily transform your datas into json, example: 注意:如果要提取更多信息,可以轻松地将数据转换为json,例如:

$data = <<<LOD
part "C28"
{ type       : "1AB010050093",
  %cadtype   : "1AB010050094",
  shapeid    : "2_1206",
  descr      : "4700.0000 pFarad 10.00 % 100.0 - VE5-VS3",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "508",
  %_Term_Seq : "" }
part "C29"
{ type       : "1AB008140029",
  shapeid    : "2_1206",
  descr      : "150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "3",
  %_Term_Seq : "" }
LOD;
$trans = array( "}\n"   => '}, ' , 'part'  => ''    ,
                "\"\n{" => ':{"' , ':'     => '":'  ,
                "\",\n" => '","' );

$data = str_replace(array_keys($trans), $trans, $data);
$data = preg_replace('~\s*+"\s*+~', '"', $data);
$json_data =json_decode('{"'.substr($data,1).'}');

foreach ($json_data as $key=>$value) {
    echo '<br/><br/>part: ' . $key . '<br/>type: ' . $value->type;    
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM