简体   繁体   English

基于某些代价寻找最优多重分区的算法

[英]Algorithm for finding optimal multiple partitionings based on some costs

I have a situation in which I need to find optimal split positions in an array based on some costs.我有一种情况,我需要根据某些成本在数组中找到最佳拆分位置。 The problem goes like this:问题是这样的:

As input I have an array of events ordered by an integer timestamp and as output I want an array of indexes which split the input array into many parts.作为输入,我有一个按整数时间戳排序的事件数组,作为输出,我想要一个将输入数组分成许多部分的索引数组。 The output array needs to be optimal (more on this below).输出数组需要是最优的(下面有更多内容)。

struct e {
    int Time;
    // other values
}

Example Input:  [e0, e1, e2, e3, e4, e5, ..., e10]
Example output: [0, 2, 6, 8] (the 0 at the start is always there)

Using the above examples I can use the split indices to partition the original array into 5 subarrays like so:使用上面的示例,我可以使用拆分索引将原始数组划分为 5 个子数组,如下所示:

[ [], [e0, e1], [e2, e3, e4, e5], [e6, e7], [e8, e9, e10] ]

The cost of this example solution is the total cost of "distances" between the subarrays:此示例解决方案的成本是子阵列之间“距离”的总成本:

double distance(e[] arr1, e[] arr2) {
    // return distance from arr1 to arr2, order matters so non-euclidean
}

total cost = distance([], [e0, e1]) + distance([e0, e1], [e2, e3, e4, e5]) + ...

At this point it is helpful to understand the actual problem.在这一点上,了解实际问题是有帮助的。

The input array represents musical notes at some time (ie a MIDI file) and I want to split the MIDI file into optimal guitar fingerings.输入数组表示某个时间的音符(即 MIDI 文件),我想将 MIDI 文件拆分为最佳吉他指法。 Hence each subarray of notes represents a chord (or a melody grouped together in a single fingering).因此,每个音符子阵列代表一个和弦(或在单个指法中组合在一起的旋律)。 The distance between two subarrays represents the difficulty of moving from one fingering pattern to another.两个子阵列之间的距离代表从一种指法模式移动到另一种指法模式的难度。 The goal is to find the easiest (optimal) way to play a song on a guitar.目标是找到在吉他上弹奏歌曲的最简单(最佳)方式。

I have not yet proved it but to me this looks like an NP-Complete or NP-Hard problem.我还没有证明它,但对我来说这看起来像是一个 NP-Complete 或 NP-Hard 问题。 Therefore it could be helpful if I could reduce this to another known problem and use a known divide and conquer algorithm.因此,如果我可以将其简化为另一个已知问题并使用已知的分而治之算法,这可能会有所帮助。 Also, one could solve this with a more traditional search algorithm (A* ?).此外,可以使用更传统的搜索算法 (A* ?) 解决此问题。 It could be efficient because we can filter out bad solutions much faster than in a regular graph (because the input is technically a complete graph since each fingering can be reached from any other fingering).它可能是有效的,因为我们可以比常规图形更快地过滤掉错误的解决方案(因为输入在技术上是一个完整的图形,因为每个指法都可以从任何其他指法到达)。

I'm not able to decide what the best approach would be so I am currently stuck.我无法决定最好的方法是什么,所以我目前陷入困境。 Any tips or ideas would be appreciated.任何提示或想法将不胜感激。

It's probably not NP-hard.这可能不是 NP-hard。

Form a graph whose nodes correspond one-to-one to (contiguous) subarrays.形成一个图,其节点与(连续的)子数组一一对应。 For each pair of nodes u, v where u's right boundary is v's left, add an arc from u to v whose length is determined by distance() .对于每对节点 u, v ,其中 u 的右边界是 v 的左边界,添加一条从 u 到 v 的弧,其长度由distance()确定。 Create an artificial source with an outgoing arc to each node whose left boundary is the beginning.创建一个带有外向弧的人工源,该弧指向左边界为起点的每个节点。 Create an artificial sink with an incoming arc from each node whose right boundary is the end.使用来自每个节点的传入弧创建一个人工汇,其右边界是末端。

Now we can find a shortest path from the source to the sink via the linear-time (in the size of the graph, so cubic in the parameter of interest) algorithm for directed acyclic graphs.现在我们可以通过用于有向无环图的线性时间(在图的大小中,在感兴趣的参数中如此立方)算法找到从源到汇的最短路径。

This is a bit late but I did solve this problem.这有点晚了,但我确实解决了这个问题。 I ended up using a slightly modified version of Dijkstra for this but any pathfinding algo could work.我最终为此使用了稍微修改过的 Dijkstra 版本,但任何寻路算法都可以工作。 I tried A* as well but finding a good heuristic proved to be extremely difficult because of the non-euclidean nature of the problem.我也尝试了 A*,但由于问题的非欧性质,找到一个好的启发式方法被证明是极其困难的。

The main changes to Dijkstra are that at some point I can already tell that some unvisited nodes cannot provide an optimal result. Dijkstra 的主要变化是,在某些时候我已经可以看出一些未访问的节点无法提供最佳结果。 This speeds up the algorithm by a lot which is also one of the reasons I didn't opt for A*.这大大加快了算法的速度,这也是我没有选择 A* 的原因之一。

The algorithm essentially works like this:该算法基本上是这样工作的:

search()
  visited = set()
  costs = map<node, double>()
  
  // add initial node to costs

  while costs is not empty:
    node = node with minimum cost in costs

    if current.Index == songEnd:
      // backtrack from current to get fingering for the song
      return solution

    visited.add(node)

    foreach neighbour of node:
      if visited node:
        continue
      
      newCost = costs[node] + distance(node, neighbour)
      add neighbour with newCost to costs

    
  // we can remove nodes that have a higher cost but
  // which have a lower index than our current node
  // this is because every fingering position is reachable
  // from any fingering positions
  // therefore a higher cost node which is not as far as our current node
  // cannot provide an optimal solution
  remove unoptimal nodes from costs

  remove node from costs

// if costs ends up empty then it is impossible to play this song
// on guitar (e.g. more than 6 notes played at the same time)

The magic of this algorithm happens in fetching the neighbours and calculating the distance between two nodes but those are irrelevant for this question.该算法的神奇之处在于获取邻居并计算两个节点之间的距离,但这些与这个问题无关。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM