简体   繁体   English

从字符串中删除重复项

[英]Removing Duplicates From String

Basically I have a script, It checks a page for proxies and adds it to a string then checks it for duplicates and outputs them. 基本上我有一个脚本,它检查页面中的代理并将其添加到字符串中,然后检查其是否重复并输出它们。

The only issue is the page being checked easily has 20k+ proxies on it so doing this makes it take about 3-4 minutes straight and most times will give me a bytes exhausted error or else max_execution error. 唯一的问题是被检查的页面上容易包含20k +代理,因此这样做大约需要3-4分钟,大多数情况下会给我一个字节耗尽错误或max_execution错误。

Is there any way to check for duplicates and just output them any quicker or easier? 有什么方法可以检查重复项并以更快或更容易的方式输出?

<?php

ini_set('memory_limit', '-1');

set_time_limit(1000);

//Curl Setup;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'x');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);        
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

//Execute Curl;
$page = curl_exec($ch);

//Regex For Matching Proxies;
preg_match_all('/(\d){1,3}\.(\d){1,3}\.(\d){1,3}\.(\d){1,3}:(\d){1,5}/ism',$page,$output);

//Foreach Proxy Found, Output it;
foreach($output[0] as $op){ $proxies .= $op."\n"; }

//This doesnt work
implode('\n',array_unique(explode('\n', $proxies)));

//Output each proxy
echo $proxies;

?>

Oh, Also, When it does get the proxies with no errors and puts them into a textarea using AJAX it makes serious lag issues. 哦,而且,当它确实获得了没有错误的代理,并使用AJAX将其放入文本区域时,就会造成严重的延迟问题。 You wont be able to click on anything thats how much lag there is. 您将无法点击任何有那么多滞后的东西。 Not sure if this whole issue is more to do with AJAX but not sure. 不确定整个问题是否与AJAX有关,但不确定。

Store them in an array and than use array_unique. 将它们存储在数组中,然后使用array_unique。

$proxies = array ();
for($i =0, $max = count ($output[0]];$i <$max; $i++) {
  $proxies[] = $output [0][$i];
  // unset to reduce memory usage. Unsure if it'll actually help but
  unset($output [0][$i]);
}
echo implode("\n", array_unique  ($proxies));

Or use the values as keys to an associative array. 或将这些值用作关联数组的键。 Not sure if this would be any faster though 不知道这是否会更快

$proxies = array ();
foreach($output[0] as $op){
  $proxies[$op] = null;
}
echo implode("\n", array_keys  ($proxies));

According to this ( http://php.net/manual/en/function.array-unique.php#70786 i don't have test it) a possible way for what you want would be something like the following code: 据此( http://php.net/manual/en/function.array-unique.php#70786我没有对其进行测试),想要的一种可能的方式将类似于以下代码:

//Populate the array maybe different for your needs
$proxies = array();
foreach($output[0] as $op) {
    $proxies[] = $op;
}
$unique_proxies = array_keys(array_flip($proxies)); 

***Request: ***请求:

It would be nice if you could make a benchmark using your data and tell us the results. 如果您可以使用自己的数据进行基准测试并告诉我们结果,那就太好了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM