简体   繁体   中英

Extract substring from string

I need to analyze a text file. This file contains some config information. The data is kept like this (general example):

size=12age=2speed=33id=93539841277312987

But the file also might contain separators like , or anything else:

size = 12 , age = 2 , speed = 33 , id = P93AR9841277312987

There is only one rule the input follows: config name is followed by = is followed by value

What I have: all the config names that can occur in the input file saved in a dictionary

What I want: Save the value in the input file to the appropiate key in the dictionary ([Size,12][age,2]...). I'm having a hard time extracting the value between one and the following config names.

What I did so far: Find the end index of a config name as a start index for the string I want. But it's hard to determine where the next config name begins. And additionally, the end of the config name list should also be handled (as there is no following config name to refer to). One idea is to search for all config names in the text file and choose the start index of the one that is smallest, but still larger than the end index of the current config name end index. But I think there is an easier way.

I have come up with this regex, but it includes the separators in the values. Each match returns two capturing groups, the key and the value:

(size|age|speed|id)\s*=\s*(.+?)(?=\s|size|age|speed|id|$)

You may modify this to add all of your config names. You can also modify the lookahead to contain your separators, in which case it won't include your separators in the values. Here is the test

Here is a Perl solution that is probably about the best you can do given the text file specification:

my @keys = ('foo','bar','2baz','bla');
my $data = "spoofoo=123  , bar= 12baz = blah";

foreach my $key (@keys)
{
    if ($data =~ /\Q$key\E\s*=\s*([\w\d]+)/) 
    {
        my $val = $1;
        foreach my $key2 (@keys)
        {
           if ($val =~ /(.*)\Q$key2\E$/)
           {
               $val = $1;
               last;
           }
        }
        print "$key value is $val\n";
    }
    else
    {
        print "$key not found\n";
    }
}

This finds the value following each key in your dictionary. Then it checks the value it found to see if the end of that value is actually the start of another key. It is possible, however, to have situations that are simply unresolvable, depending on your set of keys and potential values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM