简体   繁体   中英

JavaScript Regular Expression to Split an Ampersand Delimited String

I've been at this for hours, and I'm hitting a dead end. I've read up on Regular Expressions all over the place, but I'm still having trouble matching on anything more complex than basic patterns.

So, my problem is this:

I need to split an "&" delimited to string into a list of objects, but I need to account for the values containing the ampersand as well.

Please let me know if you can provide any help.

var subjectA = 'myTestKey=this is my test data & such&myOtherKey=this is the other value';

Update:

Alright, to begin with, thanks for the awesome, thoughtful responses. To give a little background on why I'm doing this, it's to create a cookie utility in JavaScript that is a little more intelligent and supports keys ala ASP.

With that being said, I'm finding that the following RegExp /([^&=\\s]+)=(([^&]*)(&[^&=\\s]*)*)(&|$)/g does 99% of what I need it to. I changed the RegExp suggested by the contributors below to also ignore empty spaces. This allowed me to turn the string above into the following collection:

[
    [myTestKey, this is my test data & such],
    [myOtherKey, this is the other value]]
]

It even works in some more extreme examples, allowing me to turn a string like:

var subjectB = 'thisstuff===myv=alue me==& other things=&thatstuff=my other value too';

Into:

[
    [thisstuff, ==myv=alue me==& other things=],
    [thatstuff, my other value too]
]

However, when you take a string like:

var subjectC = 'me===regexs are hard for &me&you=&you=nah, not really you\\'re just a n00b';

Everything gets out of whack again. I understand why this is happening as a result of the regular expression above (kudos for a very awesome explanation), but I'm (obviously) not comfortable enough with regular expressions to figure out a work around.

As far as importance goes, I need this cookie utility to be able to read and write cookies that can be understood by ASP and ASP.NET & vice versa. From playing with the example above, I'm thinking that we've taken it as far as we can, but if I'm wrong, any additional input would be greatly appreciated.

tl;dr - Almost there, but is it possible to account for outliers like subjectC ?

var subjectC = 'me===regexs are hard for &me&you=&you=nah, not really you\\'re just a n00b';

Actual ouput:

[
    [me, ==regexs are hard for &me],
    [you, ],
    [you, nah, not really you\'re just a n00b]
]

Versus expected output:

[
    [me, ==regexs are hard for &me&you=],
    [you, nah, not really you\'re just a n00b]
]

Thanks again for all of your help. Also, I'm actually getting better with RegExp... Crazy.

If your keys cannot contain ampersands, then it's possible:

var myregexp = /([^&=]+)=(.*?)(?=&[^&=]+=|$)/g;
var match = myregexp.exec(subject);
while (match != null) {
    key = match[1];
    value = match[2];
    // Do something with key and value
    match = myregexp.exec(subject);
}

Explanation:

(        # Match and capture in group number 1:
 [^&=]+  # One or more characters except ampersands or equals signs
)        # End of group 1
=        # Match an equals sign
(        # Match and capture in group number 2:
 .*?     # Any number of characters (as few as possible)
)        # End of group 2
(?=      # Assert that the following can be matched here:
 &       # Either an ampersand,
 [^&=]+  # followed by a key (as above),
 =       # followed by an equals sign
|        # or
 $       # the end of the string.
)        # End of lookahead.

This may not be the most efficient way to do this (because of the lookahead assertion that needs to be checked several times during each match), but it's rather straightforward.

I need to split an " & " delimited to string into a list of objects, but I need to account for the values containing the ampersand as well.

You can't.

Any data format that allows a character to appear both as a special character and as data needs a rule (usually a different way to express the character as data) to allow the two to be differentiated.

  • HTML has & and &
  • URIs have & and %26
  • CSV has " and ""
  • Most programming languages have " and \\"

Your string doesn't have any rules to determine if an & is a delimiter or an ampersand, so you can't write code that can tell the difference.

True, rules to differentiate are recommended, and true, a RegExp pattern may fail if a key contains the ampersand -or the equal!- symbol, but it can be done with plain JavaScript. You just have to think in terms of key-value pairs, and live with the fact that there might not be a RegExp pattern to solve the problem: you will have to split the string into an array, loop through the elements and merge them if necessary:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
    <head>
        <style id="styleTag" type="text/css">
        </style>
        <script type="text/javascript">
        window.onload = function()
        {
            // test data
            var s = "myTestKey=this is my test data & such&myOtherKey=this is the other value&aThirdKey=Hello=Hi&How are you&FourthKey=that's it!";

            // the split is on the ampersand symbol!
            var a = s.split(/&/);

            // loop through &-separated values; we skip the 1st element
            // because we may need to address the previous (i-1) element
            // in our loop (you are REALLY out of luck if a[0] is not a
            // key=value pair!)
            for (var i = 1; i < a.length; i++)
            {
                // the abscence of the equal symbol indicates that this element is
                // part of the value of the previous key=value pair, so merge them
                if (a[i].search(/=/) == -1)
                    a.splice(i - 1, 2, a[i - 1] + '&' + a[i]);
            }

            Data.innerHTML = s;
            Result.innerHTML = a.join('<br/>');
        }
        </script>
    </head>
    <body>
        <h1>Hello, world.</h1>
        <p>Test string:</p>
        <p id=Data></p>
        <p>Split/Splice Result:</p>
        <p id=Result></p>
    </body>
</html>

The output:

Hello, world.

Test string:

myTestKey=this is my test data & such&myOtherKey=this is the other value&aThirdKey=Hello=Hi&How are you&FourthKey=that's it!

Split/Splice Result:

myTestKey=this is my test data & such
myOtherKey=this is the other value
aThirdKey=Hello=Hi&How are you
FourthKey=that's it!

"myTestKey=this is my test data & such&myOtherKey=this is the other value".split(/&?([a-z]+)=/gi)

This returns:

["", "myTestKey", "this is my test data & such", "myOtherKey", "this is the other value"]

But if this is my test data & such would also contain an = sign, like this is my test data &such= something else , you're out of luck.

I suggest you to use

.split(/(?:=|&(?=[^&]*=))/);

Check this demo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM