简体   繁体   English

在python中从数组中查找最佳元素

[英]Finding optimal elements from an array in python

I need to devise an algorithm that finds me the most optimal elements from a list in python. 我需要设计一种算法,从python列表中找到最优化的元素。 I have a cd that holds 700mb. 我有一张700mb的CD。 And an array of 300 randomly generated file sizes varying from 30 - 90 mb. 还有300个随机生成的文件数组,大小从30-90 mb不等。 It needs to fill the cd the most optimal way, that minimum room is wasted ( looking through all possible ways) I guess it's similar to the knapsack problem, only that it has only 1 array and a limit. 它需要以最佳的方式填充cd,这是浪费最小的空间(通过所有可能的方式查看),我想它与背包问题类似,只是它只有1个数组和一个限制。 Since I'm totally new to the algorithm and datastructure scene, I have no idea how to implement this using python 由于我是算法和数据结构领域的新手,所以我不知道如何使用python实现此功能

Thanks in advance 提前致谢

As @payne points out in his comment, this is really the same as the knapsack problem. 正如@payne在他的评论中指出的那样,这实际上与背包问题相同。 The solution is therefore a simple dynamic programming algorithm. 因此,解决方案是一种简单的动态编程算法。

Say the files are arranged one after another in some order in a list. 假设文件在列表中以某种顺序一个接一个地排列。 At first, you have the choice of either choosing to include the first file or skipping it. 首先,您可以选择包含第一个文件或跳过它。 If you choose to include it, the space you have available will decrease by the size of that file. 如果选择包括它,则可用空间将减少该文件的大小。 If you choose to skip it, the space available remains unchanged. 如果选择跳过,则可用空间保持不变。 Now, you can arrive to the second file in two states. 现在,您可以在两种状态下到达第二个文件。 In one, you have chosen the first file and thus have less space, while in the other you have skipped the first file and have more space. 在一个文件中,您选择了第一个文件,因此具有较小的空间,而在另一个文件中,您跳过了第一个文件并具有更多的空间。 For each of these scenarios, you can again choose to include or skip over the second file. 对于每种情况,您都可以再次选择包括或跳过第二个文件。

Notice you can define your state simply by the file which you are considering at the moment and the available space that you have. 请注意,您可以简单地通过当前正在考虑的文件和可用空间来定义状态。 Once you have moved past the last file or the space has run out, you have come to the end of that line of choices. 一旦移至最后一个文件或空间已用完,您便到达了该选择行的末尾。

This yields a simple recurrence: 这产生一个简单的重复:

min_waste(index,space)={
   o if space=0     # no more space available, so 0 wastage

   space if index>=size(files) # no more files left, whatever is left is wasted

   min_waste(index+1,space)  if size(files[index])>space  # current file is too large skip ahead

   min( min_waste(index+1,space), min_waste(index+1,space-size(files[index])) ) otherwise
   # minimum of choosing this one and skipping ahead 
}

You can choose to implement this by filling up a table (ie 2D array) bottom up, or just write this up as a recursive function and memoize. 您可以选择通过自底向上填充表格(即2D数组)来实现此目的,也可以仅将其编写为递归函数并进行记忆。

This gives you the minimum wastage, but not which files were selected to achieve it. 这为您提供了最小的浪费,但是却没有选择达到该目的的文件。 But you can easily modify it to save information about the choice it makes in each state and use that to build up the series of choices from the starting state. 但是您可以轻松地对其进行修改,以保存有关在每种状态下做出的选择的信息,并使用该信息从起始状态构建一系列选择。

It is probably not efficient to find the absolute most optimal way. 找到绝对最佳的方法可能没有效率。 But you can use some rules of thumb, like take the largest files first, and then fill the remaining space with the first file that fits until the space is too small for any more to fit. 但是,您可以使用一些经验法则,例如先获取最大的文件,然后使用适合的第一个文件填充剩余空间,直到该空间太小而无法容纳更多文件为止。 See Bin Packing Problem . 请参阅箱装箱问题 The optimal simple algorithm is First Fit Decreasing. 最佳简单算法是“首次拟合递减”。 Sort all the files by size from largest to smallest. 按大小从大到小对所有文件进行排序。 Then place each file on the first cd where there is enough room for it to fit, until all the files are used up. 然后将每个文件放在第一张CD上足够大的空间,直到所有文件用完。

Edit 编辑

It is likely that all of the files put together do not exactly equal some number of cd's. 放在一起的所有文件可能不完全等于cd的数量。 For instance, if the total of the files is 1.6GB, that's two cd's with a little left over, even if they packed perfectly. 例如,如果文件总数为1.6GB,则即使它们打包得很好,也只有两张CD。 So if you already know that 3 cd's are the minimum required, and you try a few combinations until you get it fitting on 3 cd's, why does it need to be optimized any more than that? 因此,如果您已经知道3 cd是最低要求,并且尝试了几种组合直到适合3 cd,那么为什么还需要对其进行优化呢? You can't save any more cd's than the theoretical minimum. 您所保存的CD不能超过理论上的最小值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM