简体   繁体   English

如何使用JavaScript上的RegEx解析类似JSON的字符串上的key:value对?

[英]How to parse key:value pair on JSON-like string with RegEx on JavaScript?

I'm struggling to parse a key:value pair in a JSON-like string. 我正在努力解析类似JSON的字符串中的key:value对。 I know people will automatically say "Use JSON.parse() for this!" 我知道人们会自动说“为此使用JSON.parse() !” and I absolutely agree. 我绝对同意。 The problem is that I'm not dealing with JSON strings, but JSON-like strings. 问题是我不是在处理JSON字符串,而是在处理类似JSON的字符串。

At least my attempts of parsing these strings with JSON.parse have failed (I've tried to sanitize the string so that JSON.parse doesn't complain about malformed strings) 至少我尝试用JSON.parse解析这些字符串的尝试都失败了(我尝试清理字符串,以便JSON.parse不会抱怨格式错误的字符串)

The problem I have is that the JSON-like string sometimes is truncated and some other times is not. 我的问题是类似JSON的字符串有时会被截断,而另一些时候则不会。 What is guaranteed to happen is that the key publicProfileUrl will be in the text, all the time (or at least that's been consistent with observations) and I need to parse its value: 可以肯定的是,关键的publicProfileUrl始终在文本中(或者至少与观察结果一致),我需要解析其值:

For example, this is an example of the string: 例如,这是字符串的示例:

%%"fullName":"Eduardo Saverin",
"contactInfo":{
"publicProfileUrl":"https://sg.linkedin.com/in/saverin",
"twitterAccounts":["esaverin"],
"websites":[]},
"industry":"Internet",%%

all I'm interested in is parsing the value of publicProfileUrl. 我唯一感兴趣的是解析publicProfileUrl的值。

This is my latest attempt at doing it: 这是我最近的尝试:

\"publicProfileUrl\":\"(.*)\",

but it is matching all the way to the last comma (I added line breaks for formatting purposes only, but the original string doesn't have any line breaks). 但它一直与最后一个逗号匹配(我仅出于格式化目的添加了换行符,但原始字符串没有任何换行符)。

Here's the original string: 这是原始字符串:

%%"fullName":"Eduardo Saverin","contactInfo":{"publicProfileUrl":"https://sg.linkedin.com/in/saverin","twitterAccounts":["esaverin"],"websites":[]},"industry":"Internet",%%

So, something like 所以,像

\"publicProfileUrl\":\"(.*?)\",

should work. 应该管用。

If you want to be absolutely safe: 如果您想绝对安全:

As others have pointed out, this is not always "watertight". 正如其他人指出的那样,这并不总是“水密的”。 In your current application (url!) it is probably not an issue, but in a general case we might encounter an escaped " followed by a comma, like in "this is \\"it\\", no doubt!" , which is supposed to be part of our target string. This pattern would so far cause a premature end of our target string. If we modify the regexp a little by adding a [^\\\\] into our search group then even this nasty little pattern can cause us no harm any more: 在您当前的应用程序(url!)中,这可能不是问题,但在一般情况下,我们可能会遇到转义的"后跟逗号,就像在"this is \\"it\\", no doubt!" ,这应该是成为目标字符串的一部分。到目前为止,该模式会导致目标字符串过早结束。如果我们通过在搜索组中添加[^\\\\]稍微修改正则表达式,那么即使是这种讨厌的小模式也会导致我们不再有害:

\"publicProfileUrl\":"(.*?[^\\])\",

For the group matching add ? 对于组匹配,添加? which means as little as possible 这意味着尽可能少

\"publicProfileUrl\":\"(.*?)\",

Try excluding the closing double quote in your capture: 尝试在捕获中排除双引号:

\"publicProfileUrl\":\"([^"]*)\",

Normally, line breaks would workaround the greedy matching 通常,换行会解决贪婪匹配问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM