简体   繁体   中英

How to check variable length strings to see if they start with anything in the array

I apologize if the question is confusing as I am not really sure how to word this concept.

Currently, what I am doing, is something along the following lines as MySQL statements, however I am migrating this to be handled in C# and plan to insert the records to the database after working with the data directly instead of inserting into the database and using the following concept:

$db->exec('UPDATE `' . date('Y-m',time() - self::DAYS_TO_MERGE) . '` SET `Cost`=0, `Location`=\'Flat Rate World\' WHERE `Cost` IS NULL AND `Caller` IN (' . $FlatRateWO. ') AND SUBSTR(`Dialed`,1,7) IN (\'0114021\',\'0117095\');');
$db->exec('UPDATE `' . date('Y-m',time() - self::DAYS_TO_MERGE) . '` SET `Cost`=0, `Location`=\'Flat Rate World\' WHERE `Cost` IS NULL AND `Caller` IN (' . $FlatRateWO. ') AND SUBSTR(`Dialed`,1,6) IN (\'011420\',\'011420\',\'011852\',\'011353\',\'011353\',\'011972\',\'011972\',\'011379\',\'011379\',\'011351\',\'011351\',\'011886\');');
$db->exec('UPDATE `' . date('Y-m',time() - self::DAYS_TO_MERGE) . '` SET `Cost`=0, `Location`=\'Flat Rate World\' WHERE `Cost` IS NULL AND `Caller` IN (' . $FlatRateWO. ') AND SUBSTR(`Dialed`,1,5) IN (\'01154\',\'01154\',\'01161\',\'01161\',\'01143\',\'01143\',\'01132\',\'01132\',\'01186\',\'01186\',\'01145\',\'01145\',\'01133\',\'01133\',\'01149\',\'01149\',\'01130\',\'01130\',\'01136\',\'01136\',\'01131\',\'01131\',\'01147\',\'01148\',\'01148\',\'01182\',\'01182\',\'01165\',\'01165\',\'01134\',\'01134\',\'01141\',\'01141\',\'01146\',\'01146\',\'01166\',\'01166\',\'01144\');');
$db->exec('UPDATE `' . date('Y-m',time() - self::DAYS_TO_MERGE) . '` SET `Cost`=0, `Location`=\'Flat Rate World\' WHERE `Cost` IS NULL AND `Caller` IN (' . $FlatRateWO. ') AND SUBSTR(`Dialed`,1,4) IN (\'1787\');');

The above PHP code executes queries and are in sequence based on the length of the starting digits starting with the longest digit group first. Meaning, 0114021 being 7 digits long, gets processed prior to processing 011420 which is 6 digits long. This is to prevent cases where 0111234 has a different price to set than 011123 .

This process is working 100%, however it is very slow (average around 0.63s/query over 100,000 records). The actual values for this come from a CSV file which I must pre-process and then insert into the database, so if I can do the above processing and calculations on the records prior to inserting, I imagine this would save a lot of time.

Following is the above array converted into C# :

World = new List<string>() { "0114021", "0117095", "011420", "011852", "011353", "011972", "011972", "011379", "011351", "011886", "01154", "01161", "01143", "01132", "01186", "01145", "01133", "01149", "01130", "01136", "01131", "01147", "01148", "01182", "01165", "01134", "01141", "01146", "01166", "01144", "01135", "1787" };

What I would like to know is how can I accomplish this same task efficiently (as possible) of comparing for example the following numbers to see if they start with anything in World keeping in mind that I want the longest match returned first.

011353123456277 ... should match 011353  
011351334478399 ... should match 01135  
011326717788726 ... should match nothing -- not found.

Just tried the following code with no success :

    if ( World.All( s => "01197236718876321".Contains( s ) ) ) {
        MessageBox.Show( "found" );
    }

and

    if ( World.All( s => s.Contains("01197236718876321") ) ) {
        MessageBox.Show( "found" );
    }

Using the example found here > Using C# to check if string contains a string in string array

The first example is using nested foreach which I would like to avoid using nested loops. The Linq example looks good, but I believe the question is the reverse of what I am trying to do.


The following code seems to work, however I am not sure if it is respecting the order of the items in the array. It seems to be, but would like confirmation as I have no idea how to 'watch' what happens inside Linq's magic:

    string foundas = "";
    string number = "01197236718876321";

    if(World.Any( 
        b => {
            if(number.StartsWith(b)) {
                foundas = b;
                return true;
            } else {
                return false;
            }
        }
    ) ) {
        MessageBox.Show( foundas );
    }

Aside

I will have a follow up for this question as the next part is a bit more complex where I grab groups of rates (about 10,000), and they are also ordered by length of the group, but they have a 'cost' field which I am currently calculating on.

I would check for all hits with StartsWith and then simply take the longest string in the result (via an aggregation). There might be something simpler then aggregate.

var hit = World.Where( s => source.StartsWith(s)).Aggregate(string.Empty, (max,cur)=> max.Length > cur.Length ? max :cur);

if(!string.IsNullOrEmpty(hit))
    MessageBox.Show( "found "); 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM