PHP：将javascript对象字符串转换为php数组

Question

I get one page html source via phpQuery, and then get below string code from script tag in head via php regex: 我通过phpQuery获取一页html源，然后通过php regex从头部的脚本标签获取以下字符串代码：

var BASE_DATA = {
userInfo: {
  id: 0,
  userName: 'no-needed',
  avatarUrl: 'no-needed',
  isPgc: false,
  isOwner: false
},
headerInfo: {
  id: 0,
  isPgc: false,
  userName: 'no-needed',
  avatarUrl: 'no-needed',
  isHomePage: false,
  crumbTag: 'no-needed',
  hasBar: true
},
articleInfo: 
{
  title: 'needed',
  content: 'needed',
  groupId: 'needed',
  itemId: 'needed',
  type: 1,
  subInfo: {
    isOriginal: false,
    source: 'needed',
    time: 'needed'
  },
  tagInfo: {
    tags: [{"name":"no-needed 1"},{"name":"no-needed 2"},{"name":"no-needed 3"}],
    groupId: 'no-needed',
    itemId: 'no-needed',
    repin: 0,
  },
  has_extern_link: 0,
  coverImg: 'no-needed'
},
commentInfo:
{
  groupId: 'no-needed',
  itemId: 'no-needed',
  comments_count: 151,
  ban_comment: 0
},};

I want to convert this string to php array, like: 我想将此字符串转换为php数组，例如：

$base_data = array(
'articleInfo' => array(
    'title' => 'needed',
    'content' => 'needed',
    'groupId' => 'needed',
    'itemId' => 'needed',
    'subInfo' => array(
        'source' => 'needed',
        'time' => 'needed',
    ),
));

or 要么

$base_data = array(
'title' => 'needed',
'content' => 'needed',
'groupId' => 'needed',
'itemId' => 'needed',
'subInfo' => array(
    'source' => 'needed',
    'time' => 'needed',
),);

I already tried with many ways, like: json_decode, get the content from the braces via php regex and the function preg_match_all.But all of them run not well. 我已经尝试了很多方法，例如：json_decode，通过php regex和函数preg_match_all从括号中获取内容，但是它们都运行不佳。

I tried two ways: 我尝试了两种方法：

the first way: 第一种方式：

$json = str_ireplace(array('var BASE_DATA =', '};'), array('', '}'), $js);
json_decode($json, true);

the second way: 第二种方式：

preg_match_all('/\{([^}]+)\}/', $js, $matches);
print_r($matches[1]);

or 要么

preg_match_all('/articleInfo:\s*\{([^}]+)\}/', $script_text, $matches);
print_r($matches[1][0]);

It seems to close to finish, but it still looks no well, I have to parser string in articleInfo part.... that is why I posted this post. 它似乎快要完成了，但看起来仍然不好，我必须在articleInfo部分中解析字符串。...这就是为什么我发布了这篇文章。

I even wanted to use V8 JavaScript engine, but..... 我什至想使用V8 JavaScript引擎，但是.....

do you anyone know the better way to finish it please ? 有人知道吗，更好的方法吗？

Answer 1

I had to reformat your JSON which was not valid (checked on https://jsonlint.com/ ). 我必须重新格式化无效的JSON（在https://jsonlint.com/上检查）。

I voluntarily used multiple str_replace() so you better understand the process, however you can optimize the code below by making multiple replacements at the same time within the same str_replace(). 我自愿使用了多个str_replace（），因此您可以更好地理解该过程，但是您可以通过在同一str_replace（）中同时进行多个替换来优化下面的代码。

This works: 这有效：

<?php

$to_decode = "var BASE_DATA = {
userInfo: {
  id: 0,
  userName: 'no-needed',
  avatarUrl: 'no-needed',
  isPgc: false,
  isOwner: false
},
headerInfo: {
  id: 0,
  isPgc: false,
  userName: 'no-needed',
  avatarUrl: 'no-needed',
  isHomePage: false,
  crumbTag: 'no-needed',
  hasBar: true
},
articleInfo: 
{
  title: 'needed',
  content: 'needed',
  groupId: 'needed',
  itemId: 'needed',
  type: 1,
  subInfo: {
    isOriginal: false,
    source: 'needed',
    time: 'needed'
  },
  tagInfo: {
    tags: [{\"name\":\"no-needed 1\"},{\"name\":\"no-needed 2\"},{\"name\":\"no-needed 3\"}],
    groupId: 'no-needed',
    itemId: 'no-needed',
    repin: 0,
  },
  has_extern_link: 0,
  coverImg: 'no-needed'
},
commentInfo:
{
  groupId: 'no-needed',
  itemId: 'no-needed',
  comments_count: 151,
  ban_comment: 0
},};";

/* Clean JSON and encapsulate in brackets */
$to_decode = str_replace('var BASE_DATA = {', '', $to_decode);
$to_decode = '{'.substr($to_decode, 0, -3).'}';

/* Remove spaces, tabs, new lines, etc. */
$to_decode = str_replace(' ', '', $to_decode);
$to_decode = str_replace("\n", '', $to_decode);
$to_decode = str_replace("\t", '', $to_decode);
$to_decode = str_replace("\r", '', $to_decode);

/* Encapsulate keys with quotes */
$to_decode = preg_replace('/([a-z_]+)\:/ui', '"{$1}":', $to_decode);
$to_decode = str_replace('"{', '"', $to_decode);
$to_decode = str_replace('}"', '"', $to_decode);
$to_decode = str_replace('\'', '"', $to_decode);

/* Remove unecessary trailing commas */
$to_decode = str_replace(',}', '}', $to_decode);

echo '<pre>';
var_dump(json_decode($to_decode));

Result using print_r : 使用print_r的结果：

(I added true/false for clarity, these will only show using var_dump() otherwise) （为清楚起见，我添加了true / false，否则将仅使用var_dump（）来显示）

stdClass Object
(
    [userInfo] => stdClass Object
        (
            [id] => 0
            [userName] => no-needed
            [avatarUrl] => no-needed
            [isPgc] => false
            [isOwner] => false
        )

    [headerInfo] => stdClass Object
        (
            [id] => 0
            [isPgc] => false
            [userName] => no-needed
            [avatarUrl] => no-needed
            [isHomePage] => false
            [crumbTag] => no-needed
            [hasBar] => true
        )

    [articleInfo] => stdClass Object
        (
            [title] => needed
            [content] => needed
            [groupId] => needed
            [itemId] => needed
            [type] => 1
            [subInfo] => stdClass Object
                (
                    [isOriginal] => false
                    [source] => needed
                    [time] => needed
                )

            [tagInfo] => stdClass Object
                (
                    [tags] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [name] => no-needed1
                                )

                            [1] => stdClass Object
                                (
                                    [name] => no-needed2
                                )

                            [2] => stdClass Object
                                (
                                    [name] => no-needed3
                                )

                        )

                    [groupId] => no-needed
                    [itemId] => no-needed
                    [repin] => 0
                )

            [has_extern_link] => 0
            [coverImg] => no-needed
        )

    [commentInfo] => stdClass Object
        (
            [groupId] => no-needed
            [itemId] => no-needed
            [comments_count] => 151
            [ban_comment] => 0
        )

)

Answer 2

thank @Bruno Leveque for your idea. 感谢@Bruno Leveque的想法。

I edited your code like below so that it run well: 我如下编辑了您的代码，使其运行良好：

I changed $to_decode = str_replace(' ', '', $to_decode); 我改变了$to_decode = str_replace(' ', '', $to_decode); to $to_decode = preg_replace('/[\\n| |\\s]{2,}/',' ',$to_decode); 到$to_decode = preg_replace('/[\\n| |\\s]{2,}/',' ',$to_decode); , that means all 1+ space will be changed to 1 space. ，这意味着所有1+空间都将更改为1空间。 because sometimes we need space, like: content: ' 因为有时我们需要空间，例如：content：'
I added $to_decode = str_replace("'", '"', $to_decode); before your comment code /* Encapsulate keys with quotes */ 我在您的注释代码之前添加了$to_decode = str_replace("'", '"', $to_decode); /* Encapsulate keys with quotes */
changed $to_decode = preg_replace('/([a-z_]+)\\:/ui', '"{$1}":', $to_decode); 已更改$to_decode = preg_replace('/([a-z_]+)\\:/ui', '"{$1}":', $to_decode); to $to_decode = preg_replace('/([a-z_]+)\\: /ui', '"$1":', $to_decode); 到$to_decode = preg_replace('/([a-z_]+)\\: /ui', '"$1":', $to_decode); (one more space there); （那里还有一个空间）； and commented //$to_decode = str_replace('"{', '"', $to_decode); 并注释//$to_decode = str_replace('"{', '"', $to_decode); and //$to_decode = str_replace('}"', '"', $to_decode); 和//$to_decode = str_replace('}"', '"', $to_decode);
added one more code: $to_decode = str_replace(", }", '}', $to_decode); 再添加一个代码： $to_decode = str_replace(", }", '}', $to_decode);

so my final code is : 所以我的最终代码是：

because @Bruno Leveque does not know the exactly content of "needed" and "no-needed", so thank you for idea. 因为@Bruno Leveque不知道“需要”和“不需要”的确切内容，因此感谢您的想法。

seems no perfect way.... 似乎没有完美的方法。

PHP：将javascript对象字符串转换为php数组

问题描述

2 个解决方案

解决方案1
1 2019-04-16 23:52:05

解决方案2
0 2019-04-17 15:37:08

PHP：将javascript对象字符串转换为php数组

问题描述

2 个解决方案

解决方案1 1 2019-04-16 23:52:05

解决方案2 0 2019-04-17 15:37:08

解决方案1
1 2019-04-16 23:52:05

解决方案2
0 2019-04-17 15:37:08