简体   繁体   English

如何比较两组1000个数字?

[英]How can I compare two sets of 1000 numbers against each other?

I must check approximately 1000 numbers against 1000 other numbers. 我必须检查大约1000个数字和1000个其他数字。

I loaded both and compared them server-side: 我加载了两个并比较了服务器端:

foreach( $numbers1 as $n1 ) {
  foreach( $numbers2 as $n2 ) {
    if( $n1 == $n2 ) {
      doBla();
    }
  }
}

This took a long time, so I tried to do the same comparison client side using two hidden div elements. 这花了很长时间,所以我尝试使用两个隐藏的div元素做同样的比较客户端。 Then compared them using JavaScript. 然后使用JavaScript比较它们。 It still takes 45 seconds to load the page (using hidden div elements). 加载页面仍然需要45秒(使用隐藏的div元素)。

I do not need to load the numbers that are not the same. 我不需要加载不相同的数字。

Is there a faster algorithm? 有更快的算法吗? I am thinking of comparing them database side and just load the error numbers, then do an Ajax call for the remaining non-error numbers. 我正在考虑比较它们的数据库端,只是加载错误号,然后对剩余的非错误号进行Ajax调用。 But is a MySQL database fast enough? 但MySQL数据库是否足够快?

Sort the lists first. 首先对列表进行排序。 Then you can walk up both lists from the start, comparing as you go. 然后你可以从头开始走两个列表,比较你去的地方。

The loop would look something like this: 循环看起来像这样:

var A = getFirstArray().sort(), B = getSecondArray().sort();

var i = 0, j = 0;
while (i < A.length && j < B.length) {
    if (A[i] === B[j]) {
        doBla(A[i]);
        i++; j++;
    }
    else if (A[i] < B[j]) {
        i++;
    }
    else
        j++;
}

(That's JavaScript; you could do it server-side too, but I don't know PHP.) (那是JavaScript;你也可以在服务器端做,但我不懂PHP。)

Edit — just to be fair to all the hashtable fans (whom I respect of course), it's pretty easy to do that in JavaScript: 编辑 - 为了公平对待所有哈希表粉丝(我当然尊重他们),在JavaScript中这很容易做到:

var map = {};
for (var i = 0; i < B.length; ++i) map[B[i]] = true; // Assume integers.
for (var i = 0; i < A.length; ++i) if (map[A[i]]) doBla(A[i]);

Or if the numbers are or might be floats: 或者,如果数字是或可能是浮点数:

var map = {};
for (var i = 0; i < B.length; ++i) map['' + B[i]] = true; // Assume integers.
for (var i = 0; i < A.length; ++i) if (map['' + A[i]]) doBla(A[i]);

Since numbers are pretty cheap to hash (even in JavaScript, converting to string before hashing is surprisingly cheap), this would be pretty fast. 由于哈希数字相当便宜(即使在JavaScript中,在散列之前转换为字符串也非常便宜),这将非常快。

In database terms this can a join of 1000 rows to another 1000 rows. 在数据库术语中,这可以是1000行到另外1000行的连接。 Any modern database system can handle this. 任何现代数据库系统都可以处理。

select x from table1
inner join table2
on table1.x = table2.y

where table1 and table2 are the rows concerned and could be the same table. table1table2是相关的行,可以是同一个表。

What you have shouldnt take that long - what does doBla() do? 你不应该花那么长时间 - doBla()做什么? I suspect that is taking the time? 我怀疑是花时间吗? Comparing two sets of 1000000 numbers with the same algorithm takes no time at all.. 使用相同的算法比较两组1000000个数字根本不需要时间。

This is hilarious - the number of optimisation techniques as answers - the problem is not your algorithm - it is whatever doBla() does that is taking the time by a factor many times greater than any optimisation would help you :) esp. 这很有趣 - 作为答案的优化技术的数量 - 问题不是你的算法 - 它是什么doBla()做的是花费时间比任何优化可以帮助你多倍的因子:) esp。 given the sets are only 1000 long and you have to sort them first.. 鉴于套装只有1000长,你必须先对它们进行排序..

Maybe just intersect the array values to find numbers existing in both arrays? 也许只是交叉数组值来查找两个数组中存在的数字?

$result = array_intersect($numbers1, $numbers2);
foreach ($result as $val)
  doBla();

If you sort list2 first and then do a binary search for each number in list1 you'll see a huge speed increase. 如果您首先对list2进行排序,然后对list1中的每个数字进行二进制搜索,您将看到速度大幅提升。

I'm not a PHP guy, but this should give you the idea: 不是一个PHP人,但这应该给你的想法:

sort($numbers2);

foreach($numbers1 as $n1)
{
   if (BinarySearch($numbers2, $n1) >= 0) {
     doBla();
 }
}

Obviously not being a PHP guy I don't know the library, but I'm sure sorting and binary searching should be easy enough to find. 显然不是一个PHP人我不知道库,但我确信排序和二进制搜索应该很容易找到。

Note: In case you're not familiar with a binary search; 注意:如果您不熟悉二进制搜索; you're sorting list2 because binary searches need to operate on sorted lists. 你正在对list2进行排序,因为二进制搜索需要对排序列表进行操作。

先排序它们。

I'm not a PHP expert, so this may need some debugging, but you can do this easily in O(n) time: 我不是PHP专家,所以这可能需要一些调试,但你可以在O(n)时间内轻松完成:

// Load one array into a hashtable, keyed by the number: O(n).
$keys1 = [];
foreach($numbers1 as $n1) $keys1[$n1] = true;

// Find the intersections with the other array:
foreach($numbers2 as $n2) { // O(n)
  if (isset($keys1[$n2]) { // O(1)
     doBla();
  }
}

Regardless, the intersection isn't where your time is going. 无论如何,交汇点不是你的时间。 Even a bad O(n^2) implementation like you have now should be able to go through 1000 numbers in a second. 即使像你现在这样糟糕的O(n ^ 2)实现也应该能够在一秒钟内完成1000个数字。

Stop - why are you doing this? - 你为什么要这样做?

If the numbers are already in a SQL database, then do a join and let the DB figure out the most efficient route. 如果数字已经在SQL数据库中,那么进行连接并让DB找出最有效的路由。

If they aren't in a database, then I'm betting you've gone off track somewhere and really ought to reconsider how you got here. 如果他们不在数据库中,那么我打赌你已经离开了某个地方,真的应该重新考虑你是如何到达这里的。

$same_numbers = array_intersect($numbers1, $$numbers2);

foreach($same_numbers as $n)
{
  doBla();
}

Sort both lists, then walk both lists at the same time using the old-master new-master sequential update pattern . 对两个列表进行排序,然后使用old-master new-master顺序更新模式同时遍历两个列表。 As long as you can sort the data it is the fastest way since your really only walking the list once, to the longest length of the largest list. 只要您可以对数据进行排序,这是最快的方式,因为您实际上只需要将列表一次,到最大列表的最长长度。

Your code is simply more complicated then in needs to be. 您的代码根本需要更加复杂。

Assuming what you're looking for is that the numbers in each position match (and not just that the array contains the same numbers), you can flatten your loop to a single for. 假设您正在寻找的是每个位置的数字匹配(而不仅仅是数组包含相同的数字),您可以将循环展平为单个。

<?php
// Fill two arrays with random numbers as proof.
$first_array = array(1000);
$second_array = array(1000);
for($i=0; $i<1000; $i++) $first_array[$i] = rand(0, 1000);
for($i=0; $i<1000; $i++) $second_array[$i] = rand(0, 1000);

// The loop you care about.
for($i=0; $i<1000; $i++) if ($first_array[$i] != $second_array[$i]) echo "Error at $i: first_array was {$first_array[$i]}, second was {$second_array[$i]}<br>";

?>

Using the code above, you will only loop 1000 times, as opposed to looping 1000000 times. 使用上面的代码,您将只循环1000次,而不是循环1000000次。

Now, if you need to just check that a number appears or does not appear in the arrays, use array_diff and array_intersect as follows: 现在,如果您只需要检查数组中是否出现数字,请使用array_diff和array_intersect,如下所示:

<?php
// Fill two arrays with random numbers as proof.
$first_array = array(1000);
$second_array = array(1000);
for($i=0; $i<1000; $i++) $first_array[$i] = rand(0, 1000);
for($i=0; $i<1000; $i++) $second_array[$i] = rand(0, 1000);

$matches = array_intersect($first_array, $second_array);
$differences = array_diff($first_array, $second_array);

?>

Maybe I'm not seeing something here but this looks like a classic case of set intersection. 也许我在这里看不到什么,但这看起来像是一个经典的集合交叉案例。 Here's a few lines in perl that'll do it. 这是perl中的几行,它们会做到这一点。

foreach $e (@a, @b) { $union{$e}++ && $isect{$e}++ } foreach $ e(@ a,@ b){$ union {$ e} ++ && $ isect {$ e} ++}

@union = keys %union; @union = keys%union; @isect = keys %isect; @isect = keys%isect;

At the end of these lines of code @isect will contain all numbers that are in both @a and @b. 在这些代码行的末尾,@ isect将包含@a和@b中的所有数字。 I'm sure this is translatable to php more or less directly. 我确信这可以或多或少地直接转换为php。 FWIW, this is my favorite piece of code from the Perl Cookbook. FWIW,这是我最喜欢的Perl Cookbook代码。

You can do it in O(n) time if you use bucket sort. 如果使用桶排序,可以在O(n)时间内完成。 Assuming you know the maximum value the numbers can take (although there are ways around that). 假设您知道数字可以采用的最大值(尽管有办法解决)。

http://en.wikipedia.org/wiki/Bucket_sort http://en.wikipedia.org/wiki/Bucket_sort

I think it would be much easier to use the built in array_intersect function. 我认为使用内置的array_intersect函数要容易得多。 Using your example, you could do: 使用您的示例,您可以:

$results = array_intersect($numbers1, $numbers2);
foreach($results as $rk => $rv) {
    doSomething($rv);
}

A better way would be to do something like this: 更好的方法是做这样的事情:

// 1. Create a hash map from one of the lists.
var hm = { };
for (var i in list1) {
  if (!hm[list1[i]]) {
    hm[list1[i]] = 1;
  } else { hm[list1[i]] += 1; }
}

// 2. Lookup each element in the other list.
for (var i in list2) {
  if (hm[list2[i]] >= 1) {
    for (var j = 0; j < hm[list2[i]]; ++j) {
      doBla();
    }
  }
}

This is guaranteed O(n) [assuming insertion an lookup in a hash map is O(1) amortized]. 这是保证O(n)[假设在哈希映射中插入查找是O(1)摊销]。

Update: The worst case of this algorithm would be O(n 2 ) and there is no way to reduce -- unless you change the semantics of the program. 更新:此算法的最坏情况是O(n 2 )并且无法减少 - 除非您更改程序的语义。 This is because in the worst case, the program will call doBla() n 2 number of times if all the numbers in both the lists are the same. 这是因为在最坏的情况下,如果两个列表中的所有数字都相同,程序将会多次调用doBla()n 2次。 However, if both the lists have unique numbers (ie generally unique within a list), then the runtime would tend towards O(n). 但是,如果两个列表都具有唯一的数字(即通常在列表中是唯一的),则运行时将倾向于O(n)。

我将在Visual Basic中创建一个GUI界面,看看我是否可以跟踪这些数字

Mergesort both lists, start at the beginning of both lists, and then search through each list for similar numbers at the same time. Mergesort列出,从两个列表的开头开始,然后同时搜索每个列表中的相似数字。

So, in pseudocode, it would go something like... 所以,在伪代码中,它会像...

Mergesort (List A);
Mergesort (list B)

$Apos = 0;
$Bpos = 0;

while( $Apos != A.Length && $Bpos != B.length) // while you have not reached the end of either list
{
if (A[$Apos] == B[$Bpos])// found a match
doSomething();

else if (A[$Apos] > B[$Bpos]) // B is lower than A, so have B try and catch up to A.
$Bpos++;

else if (A[$Apos] < B[$Bpos]) // the value at A is less than the value at B, so increment B
$Apos++;

}

If I'm right, the speed of this algorithm is O(n logn). 如果我是对的,这个算法的速度是O(n logn)。

I'm not sure why Mrk Mnl was downvoted but the function call is the overhead here. 我不确定为什么Mrk Mnl被downvoted但函数调用是这里的开销

Push out the matched numbers into another array and doBla() on them after the comparisons. 将匹配的数字推出到另一个数组中,并在比较后将 doBla()推送到它们上面。 As a test // out doBla() and see if you are experiencing the same performance issue. 作为测试//输出doBla()并查看您是否遇到了相同的性能问题。

Would it be possible to put these numbers into two database tables, and then do an INNER JOIN ? 是否可以将这些数字放入两个数据库表中,然后进行INNER JOIN This will be very efficient and provide only the numbers which are contained in both tables. 这将非常有效,并且仅提供两个表中包含的数字。 This is a perfect task for a database. 这是数据库的完美任务。

  1. Create two duplicate collections, preferably ones with fast lookup times, like HashSet or perhaps TreeSet. 创建两个重复的集合,最好是具有快速查找时间的集合,如HashSet或TreeSet。 Avoid Lists as they have very poor lookup times. 避免列表,因为它们的查找时间非常短。

  2. As you find elements, remove them from both sets. 找到元素后,从两个集合中删除它们。 This can reduce lookup times by having fewer elements to sift through in later searches. 这可以通过在以后的搜索中筛选更少的元素来减少查找时间。

If you're trying to get a list of numbers without any duplicates, you can use a hash: 如果您尝试获取没有任何重复项的数字列表,则可以使用哈希:

$unique = array();
foreach ($list1 as $num) {
  $unique[$num] = $num;
}
foreach ($list2 as $num) {
  $unique[$num] = $num;
}
$unique = array_keys($unique);

It's going to be slightly (very slightly) slower than the array walk method, but it's cleaner in my opinion. 它会比阵列走路方法稍微(非常轻微)慢,但在我看来它更清晰。

Merge, sort and then count 合并,排序然后计数

<?php
    $first = array('1001', '1002', '1003', '1004', '1005');
    $second = array('1002', '1003', '1004', '1005', '1006');
    $merged = array_merge($first, $first, $second);
    sort($merged);
    print_r(array_count_values($merged));
?>

Output / the values with a count of three are the ones you want 输出/计数为3的值是您想要的值

Array
(
    [1001] => 2
    [1002] => 3
    [1003] => 3
    [1004] => 3
    [1005] => 3
    [1006] => 1
)

This code will call doBla() once for each time a value in $numbers1 is found in $numbers2 : 每次在$numbers1中找到$numbers2的值时,此代码将调用doBla()一次:

// get [val => occurences, ...] for $numbers2
$counts = array_count_values($numbers2);
foreach ($numbers1 as $n1) {
    // if $n1 occurs in $numbers2...
    if (isset($counts[$n1])) {
        // call doBla() once for each occurence
        for ($i=0; $i < $counts[$n1]; $i++) {
            doBla();
        }
    }
}

If you only need to call doBla() once if a match is found: 如果找到匹配项,您只需要调用一次doBla()

foreach ($numbers1 as $n1) {
    if (in_array($n1, $numbers2))
        doBla();
}

If $numbers1 and $numbers2 will only contain unique values, or if the number of times any specific value occurs in both arrays is not important, array_intersect() will do the job: 如果$numbers1$numbers2只包含唯一值,或者两个数组中任何特定值出现的次数不重要,则array_intersect()将执行以下操作:

$dups = array_intersect($numbers1, $numbers2);
foreach ($dups as $n)
    doBla();

I agree with several earlier posts that the calls to doBla() are probably taking more time than iterating over the arrays. 我同意几个早先的帖子,对doBla()的调用可能比迭代数组花费更多的时间。

使用WebAssembly而不是JavaScript。

This problem can be break into 2 tasks. 这个问题可以分为2个任务。 1st task is finding all combinations (n^2-n)/2. 第一项任务是找到所有组合(n ^ 2-n)/ 2。 For n=1000 the solution is x=499500. 对于n = 1000,解是x = 499500。 The 2nd task is to loop through all x numbers and compare them with the function doBla(). 第二个任务是遍历所有x数字并将它们与函数doBla()进行比较。

function getWayStr(curr) {
 var nextAbove = -1;
 for (var i = curr + 1; i < waypoints.length; ++i) {
  if (nextAbove == -1) {
    nextAbove = i;
   } else {
     wayStr.push(waypoints[i]);
     wayStr.push(waypoints[curr]);
   }
  }
  if (nextAbove != -1) {
    wayStr.push(waypoints[nextAbove]);
    getWayStr(nextAbove);
    wayStr.push(waypoints[curr]);
  }
 } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM