简体   繁体   English

在Web服务器上查找文件名中的最大数字

[英]Find highest number in file names on web server

On my webserver, I have a folder with numbered image files: 在我的网络服务器上,我有一个带有编号图像文件的文件夹:

...
296.jpg
297.png
298.gif
...

The numbers are consecutive (1, 2, 3, ...). 数字是连续的(1,2,3,...)。 The file name contains only the number ("12.jpg", not "photo_12.jpg"). 文件名仅包含数字(“12.jpg”,而不是“photo_12.jpg”)。 The files may not be created and stored in the order of their file name numbering (ie 2000.jpg might be older than 2.jpg). 可能无法按文件名编号的顺序创建和存储文件(即2000.jpg可能早于2.jpg)。

I want to find the highest number in the file names. 我想找到文件名中的最大数字

I do this: 我这样做:

$glob = glob("path/to/dir/*");
$highest = max(preg_replace("|[^0-9]|", "", $glob));
// $highest is now something like 381554

Is there a less resource heavy method? 是否有较少的资源重量方法?

First of all you have to decide what kind of resources you want to save, because there will be different approaches depending on whether it is memory, IO operations of something else. 首先,您必须决定要保存哪种资源,因为根据它是内存,其他内容的IO操作,会有不同的方法。

So far you solution is the most optimised in terms of working speed, but it's very memory consuming, since there may be a lot of files in the folder and you'll hit the memory limit. 到目前为止,您的解决方案在工作速度方面是最优化的,但它非常耗费内存,因为文件夹中可能有很多文件,您将达到内存限制。

I suggest you cache the max somewhere, in Redis for example. 我建议你在Redis中缓存 max,例如。 And then update it every time you upload a new image. 然后在每次上传新图像时更新它。 To cache it you have to fetch it first. 要缓存它,你必须先获取它。 You can get the initial max value either with a simple script: 您可以使用简单的脚本获取初始最大值:

$max = 0;
foreach (new DirectoryIterator('.') as $fileInfo) {
    if ($fileInfo->isDot()) continue;

    $current = pathinfo($fileInfo->getFilename())['filename'];
    if (!is_numeric($current)) continue;
    if ($current > $max) $max = $current;
}

Or with a call to an external sort command as vladyslav-savchenko suggested. 或者像vladyslav-savchenko建议的那样调用外部sort命令。

Then you just have to maintain the max value updated. 然后你只需要保持更新的最大值。 Update it either on every upload, by cron of both. 通过每次上传,由两者的cron更新。

This may be a working way 这可能是一种有效的方式

$numeric_files=glob("[0-9]*.*");
$slike = array_map(function($e){return pathinfo($e, PATHINFO_FILENAME);}, $numeric_files);
echo max($slike);

Starting with 从...开始

$path = "path/to/dir/";

Let's get an array of the file 让我们得到一个文件数组

//$myFile // $ MYFILE

if ($handle = opendir($path)) {
    while (false !== ($entry = readdir($handle))) {
        if ($entry != "." && $entry != "..") {
            if(!is_dir($entry)){
              $myFile[] = substr($entry,0,strrpos($entry, "."));
            }
        }
    }
    closedir($handle);
}

Then we can sort the array 然后我们可以对数组进行排序

rsort($myFile,SORT_NUMERIC);

The first one will be the one we were searching 第一个将是我们正在搜索的那个

print $myFile[0];

This is an example and is untested. 这是一个例子,未经测试。

I don't think that this will result in a good solution. 我认为这不会产生良好的解决方案。 Especially with a large number of files what I'm assuming because of your comment that the highest number is about 381k. 特别是对于大量的文件我正在假设,因为你的评论最高的数字是大约381k。 This will result in high I/O and maybe real performance problem when you've too much visitors and/or a slow/highly loaded server, maybe with an (older) HDD which is common for storing images. 当您有太多的访问者和/或缓慢/高负载的服务器时,这将导致高I / O并且可能导致真正的性能问题,可能存在用于存储图像的(较旧的)HDD。

I would recomment you to store the filenames in a database. 我建议你将文件名存储在数据库中。 Even if you're not using a database yet this is the best solution because you can get the highest number with a clean SQL-Query which will causing much less I/O load instead of scanning huge directorys on the filesystem. 即使你没有使用数据库,这也是最好的解决方案,因为你可以通过干净的SQL查询得到最高的数字,这将导致更少的I / O负载,而不是扫描文件系统上的巨大的directorys。 Further you can profit from indexes which will once more optimize the speed of our database-querys. 此外,您可以从索引中获益,这些索引将再次优化数据库查询的速度。

It's not neccessary to store the full path and even a bad idea when you've all files in one folder. 当你在一个文件夹中存放所有文件时,不需要存储完整路径甚至是坏主意。 In this case you'll produce unneccessary redundance which will waste storage and produce extra work when you maybe want to edit the path later. 在这种情况下,当您可能希望稍后编辑路径时,您将产生不必要的冗余,这将浪费存储并产生额外的工作。 It's better to store only the filenames and create a constant in our config or script for the path like 最好只存储文件名,并在我们的配置或脚本中为路径创建一个常量

define('IMAGE_PATH', '/var/www/images');

When you want to proceed with the selected image, you can do something like this: 如果要继续使用所选图像,可以执行以下操作:

$fullImagePath = IMAGE_PATH . $databaseQueryResult['fileName'];

I don't know what you want to do but maybe it's a good idea to think about your design when you're not using a database yet. 我不知道你想做什么,但是当你还没有使用数据库时考虑你的设计是个好主意。 Something in the image-hosting area looks for me like that a database can be a good idea here, also for other features you may want to implement. 图像托管区域中的某些东西对我来说就像数据库在这里是一个好主意,也适用于您可能想要实现的其他功能。

You can use something like this: 你可以使用这样的东西:

$path = 'path_to_directory';

$command = 'ls ' . escapeshellarg($path) . ' | sort -rn | head -1';

if (!($output = system($command))) {
    print 'Error during execution of: "' . $command . '"';
}

print $output;

Here is what I was getting at with my comment about a binary search. 以下是我对二进制搜索的评论。

It needs no memory and takes just 0.003 seconds and 35 filechecks with 100,000 files. 它不需要内存,只需要0.003秒和35个文件检查,包含100,000个文件。

I guess you could code it in PHP, or shell out to it. 我想你可以用PHP编写代码,或者用shell代替它。

#!/bin/bash
checkfile(){
   if [ -f "$1.jpg" ]; then
      echo DEBUG: Testing ${i}.jpg, exists - so move min marker to $1
      min=$1
      return 0
   else
      echo DEBUG: Testing ${i}.jpg, nope - so move max marker to $1
      max=$1
      return 1
   fi
}
i=1
min=0
max=-1
while : ; do
   if checkfile $i && [[ $max -eq -1 ]]; then
     ((i*=2))
   else
     ((i=(max+min)/2))
   fi
   diff=$((max-min))
   [[ $diff -eq 1 ]] && break
done
echo Result:$min

Output: 输出:

DEBUG: Testing 1.jpg, exists - so move min marker to 1
DEBUG: Testing 2.jpg, exists - so move min marker to 2
DEBUG: Testing 4.jpg, exists - so move min marker to 4
DEBUG: Testing 8.jpg, exists - so move min marker to 8
DEBUG: Testing 16.jpg, exists - so move min marker to 16
DEBUG: Testing 32.jpg, exists - so move min marker to 32
DEBUG: Testing 64.jpg, exists - so move min marker to 64
DEBUG: Testing 128.jpg, exists - so move min marker to 128
DEBUG: Testing 256.jpg, exists - so move min marker to 256
DEBUG: Testing 512.jpg, exists - so move min marker to 512
DEBUG: Testing 1024.jpg, exists - so move min marker to 1024
DEBUG: Testing 2048.jpg, exists - so move min marker to 2048
DEBUG: Testing 4096.jpg, exists - so move min marker to 4096
DEBUG: Testing 8192.jpg, exists - so move min marker to 8192
DEBUG: Testing 16384.jpg, exists - so move min marker to 16384
DEBUG: Testing 32768.jpg, exists - so move min marker to 32768
DEBUG: Testing 65536.jpg, exists - so move min marker to 65536
DEBUG: Testing 131072.jpg, nope - so move max marker to 131072
DEBUG: Testing 98304.jpg, exists - so move min marker to 98304
DEBUG: Testing 114688.jpg, nope - so move max marker to 114688
DEBUG: Testing 106496.jpg, nope - so move max marker to 106496
DEBUG: Testing 102400.jpg, nope - so move max marker to 102400
DEBUG: Testing 100352.jpg, nope - so move max marker to 100352
DEBUG: Testing 99328.jpg, exists - so move min marker to 99328
DEBUG: Testing 99840.jpg, exists - so move min marker to 99840
DEBUG: Testing 100096.jpg, nope - so move max marker to 100096
DEBUG: Testing 99968.jpg, exists - so move min marker to 99968
DEBUG: Testing 100032.jpg, nope - so move max marker to 100032
DEBUG: Testing 100000.jpg, exists - so move min marker to 100000
DEBUG: Testing 100016.jpg, nope - so move max marker to 100016
DEBUG: Testing 100008.jpg, nope - so move max marker to 100008
DEBUG: Testing 100004.jpg, nope - so move max marker to 100004
DEBUG: Testing 100002.jpg, nope - so move max marker to 100002
DEBUG: Testing 100001.jpg, nope - so move max marker to 100001
Result:100000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM