简体   繁体   English

PHP 将文本解析为结构化 Json

[英]PHP parsing text to a structured Json

I have a text like this:我有这样的文字:

some text Xª 1234567-89.0123.45.6789 (YZ) 01/01/2011 Esbjörn Svensson 02/02/2022 Awesome Trio Wª 0987654-32.1098.76.5432 (KBoo) 07/09/2013 Some Full Name 09/07/2017 Observation 12/12/2018 some text that I don't want to keep Xª 4335678-98.7123.95.5689 09/10/2010 Name Here 08/09/2020 Observation and more text to delete

I need a structured Json like this:我需要一个结构化的 Json 像这样:

     {
        "data":
            {
                "Team": "Xª",
                "ID": "1234567-89.0123.45.6789",
                "Type": "(YZ)",
                "Date 1": "01/01/2011",
                "Name": "Esbjörn Svensson",
                "Date 2: "02/02/2022",
                "Obs": "Awesome Trio",
                "Date 3": ""
            },
            {
                "Team": "Wª",
                "ID": "0987654-32.1098.76.5432",
                "Type": "(KBoo)",
                "Date 1": "07/09/2013",
                "Name": "Some Full Name",
                "Date 2: "09/07/2017",
                "Obs": "Observation",
                "Date 3": "12/12/2018"
            },
            {
                "Team": "Xª",
                "ID": "4335678-98.7123.95.5689",
                "Type": "",
                "Date 1": "09/10/2010",
                "Name": "Name Here",Name Here
                "Date 2: "08/09/2020",
                "Obs": "Observation",
                "Date 3": ""
            }
     }

I searched a lot of code here, but I can't get it to work the way I need it.我在这里搜索了很多代码,但我无法让它按照我需要的方式工作。 I tried to split the text where there is a blank space and the "ª" character, but it didn't work.我试图在有空格和“ª”字符的地方分割文本,但它没有用。

foreach($textsource as &$lista) {
        $y = implode(' ',$lista);
        $x = preg_split(' ', $y);
        $delimiter = '/\ª/';
        $childIndex = array_keys(preg_grep($delimiter, $x));
        $chunks = [];
        $final = [];
        for ($i=0; $i<count($childIndex); $i++) {
            $chunks[$i]['begin'] = $childIndex[$i];
            if (isset($childIndex[$i+1])) {
            $chunks[$i]['len'] = $childIndex[$i+1]-$childIndex[$i];
            }
    }
    foreach ($chunks as $chunk) {
        if (isset($chunk['len'])){
            $final[] = array_slice($x, $chunk['begin'], $chunk['len']);
        } else {
            $final[] = array_slice($x, $chunk['begin']);
        }
    }
    echo "<pre>";
    print_r($final);
    echo "</pre>";

I appreciate any help.我很感激任何帮助。

So I tried to solve this, here is your working soluiton .所以我试图解决这个问题,这是你的工作解决方案 btw your json is not valid.顺便说一句,您的 json 无效。 check that with jsonlint.用 jsonlint 检查。

$text = "some text Xª 1234567-89.0123.45.6789 (YZ) 01/01/2011 Esbjörn Svensson 02/02/2022 Awesome Trio Wª 0987654-32.1098.76.5432 (KBoo) 07/09/2013 Some Full Name 09/07/2017 Observation 12/12/2018 some text that I don't want to keep Xª 4335678-98.7123.95.5689 09/10/2010 Name Here 08/09/2020 Observation and more text to delete";

$arr = explode("ª", $text);
$team_arr = array_map(function ($team){ return substr($team, -1)."ª"; }, $arr);
array_shift($arr);

array_pop($team_arr);

$text = 'ignore everything except this (text)';
preg_match('#\((.*?)\)#', $text, $match);

$t = "01/01/2011 Esbjörn Svensson 02/02/2022";
$regEx = '/(\d{2})\/(\d{2})\/(\d{4})/';
preg_match_all($regEx, $t, $result);


$res = [];

$start = 0;
$end = count($arr);
for($i = 1; $i < $end; $i++){

    $obj = $arr[$i];

    $temp_obj_arr = explode(' ', trim($obj));

    preg_match('#\((.*?)\)#', $obj, $match);
    $type = (!empty($match[0]) ? $match[0] : "");

    preg_match_all('/(\d{2})\/(\d{2})\/(\d{4})/', $obj, $dates);
    $date1 = (!empty($dates[0][0]) ? $dates[0][0] : "");
    $date2 = (!empty($dates[0][1]) ? $dates[0][1] : "");
    $date3 = (!empty($dates[0][2]) ? $dates[0][2] : "");

    $tname = explode($date1." ", $obj);
    $char_arr = str_split($tname[1]);
    $name = '';
    foreach($char_arr as $ch){
        if (is_numeric($ch)) {
            break;
        } else {
            $name .=$ch;
        }
    }

    $tname = explode($date2." ", $obj);
    $char_arr = str_split($tname[1]);
    $obs = '';
    foreach($char_arr as $ch){
        if (is_numeric($ch)) {
            break;
        } else {
            $obs .=$ch;
        }
    }

    $tkey = $i;
    $tkey--;
    $obj = [];
    $obj['Team'] = $team_arr[$tkey];
    $obj['ID'] = $temp_obj_arr[0];
    $obj['Type'] = $type;
    $obj['Date 1'] = $date1;
    $obj['Name'] = $name;
    $obj['Date 2'] = $date2;
    $obj['Obs'] = $obs;
    $obj['Date 3'] = $date3;

    $res[] = $obj;

}

$json_res = json_encode($res, true);
print_r($json_res);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM