简体   繁体   English

PHP多维数组:查找多元素子数组匹配...是否有循环的备用选项?

[英]PHP Multi-Dimensional Array: Finding a multi-element sub-array match … is there an alternate option to looping?

We have some customer data which started in a separate data-store. 我们有一些客户数据在一个单独的数据存储中开始。 I have a consolidation script to standardize and migrate it into our core DB. 我有一个合并脚本来标准化并将其迁移到我们的核心数据库中。 There are somewhere around 60,000-70,000 records being migrated. 有大约60,000-70,000条记录正在迁移。

Naturally, there was a little bug, and it failed around row 9k. 当然,有一个小虫子,它在第9k排失败了。
My next trick is to make the script able to pick up where it left off when it is run again. 我的下一个技巧是让脚本能够在再次运行时从中断处继续。


FYI: 供参考:
The source records are pretty icky, and split over 5 tables by what brand they purchased ... IE: 源记录非常蹩脚,并按照他们购买的品牌划分了5个表... IE:

create TABLE `brand1_custs` (`id` int(9), `company_name` varchar(112), etc...)
create TABLE `brand2_custs` (`id` int(9), `company_name` varchar(112), etc...)

Of course, a given company name can (and does) exist in multiple source tables. 当然,给定的公司名称可以(并且确实)存在于多个源表中。


Anyhow ... I used the ParseCSV lib for logging, and each row gets logged if successfully migrated (some rows get skipped if they are just too ugly to parse programatically). 无论如何......我使用ParseCSV lib进行日志记录,如果成功迁移,每行都会被记录(如果它们过于丑陋而无法以编程方式解析,则会跳过某些行)。 When opening the log back up with ParseCSV, it comes in looking like: 使用ParseCSV打开日志时,它看起来像:

array(
  0 => array( 'row_id'   =>  '1', 
          'company_name' =>  'Cust A', 
          'blah'         =>  'blah', 
          'source_tbl'   =>  'brand1_cust'
      ),
  1 => array( 'row_id'   =>  '2',
          'company_name' =>  'customer B',
          'blah'         =>  'blah',
          'source_tbl'   =>  'brand1_cust'
      ),
  2 => array( 'row_id'   =>  '1',
          'company_name' =>  'Cust A',
          'blah'         =>  'blah',
          'source_tbl'   =>  'brand2_cust'
      ),
  etc...
)


My current workflow is along the lines of: 我目前的工作流程如下:

foreach( $source_table AS $src){
    $results = // get all rows from $src
    foreach($results AS $row){
        // heavy lifting
    {
}


My Plan is to check the 我的计划是检查
$row->id and $src->tbl combination $row->id$src->tbl组合
for a match in the 在比赛中
$log[?x?]['row_id'] and $log[?x?]['source_tbl'] combination. $log[?x?]['row_id']$log[?x?]['source_tbl']组合。

In order to achieve that, I would have to do a foreach($log AS $xyz) loop inside the foreach($results AS $row) loop, and skip any rows which are found to have already been migrated (otherwise, they would get duplicated). 为了实现这一目标,我将不得不做foreach($log AS $xyz) 内部循环foreach($results AS $row)循环,并跳过那些被发现已迁移的所有行(否则,他们会得到重复)。
That seems like a LOT of of looping to me. 这似乎是很多循环给我。
What about when we get up around record # 40 or 50 thousand? 当我们在#40或5万的记录中起床时怎么样?
That would be 50k x 50k loops!! 这将是50k x 50k循环!!

Question: 题:
Is there a better way for me to check if a sub-array has a "row_id" and "source_tbl" match other than looping each time? 有没有更好的方法让我检查一个子数组是否有“row_id”和“source_tbl”匹配不是每次循环?


NOTE: as always, if there's a completely different way I should be thinking about this, I'm open to any and all suggestions :) 注意:一如既往,如果有一种完全不同的方式我应该考虑这个,我对任何和所有的建议开放:)

I think that you should do a preprocessing on the log doing a hash (or composed key) of row_id and source_tbl and store it in an hashmap then for each row just construct the hash of the key and check if it is already defined in the hashmap. 我认为您应该对日志执行row_id and source_tbl的散列(或组合键)的row_id and并将其存储在散列映射中,然后为每一行构建密钥的散列并检查它是否已在散列映射中定义。

I am telling you to use hashed set because you can search in it with O(k) time otherwise it would be the same as you are proposing only that it would be a cleaner code. 我告诉你使用散列集,因为你可以用O(k)时间搜索它,否则它就像你提出的那样只是一个更干净的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM