简体   繁体   English

如何在数字列表中找到“奇数”

[英]How to find the "odd one out" in a list of numbers

I have an array of numbers [x1, x2, x3, etc] that is size is over 20 elements and I'm trying to put together an algorithm to sort the elements based on the "oddness" they have relative to the rest of the list.我有一个数字数组 [x1、x2、x3 等],其大小超过 20 个元素,我正在尝试组合一个算法,根据它们相对于其余元素的“奇数”对元素进行排序列表。

I'm defining the "oddness" as the distance from the barycenters, given some threshold T1.给定一些阈值 T1,我将“奇数”定义为与重心的距离。 The barycenters are where the values tend to concentrate, possibly given some second threshold T2.重心是值倾向于集中的地方,可能给定某个第二阈值 T2。

Example: [20, 20, 21, 31, 24, 20, 70, 21, 31, 24, 20, 20, 21, 31, 24, 20, 20, 21, 31, 24] and T1=10 The barycenter is about 24 and only odd one out is 70例子: [20, 20, 21, 31, 24, 20, 70, 21, 31, 24, 20, 20, 21, 31, 24, 20, 20, 21, 31, 24] 和 T1=10 重心为大约 24,只有奇数一个是 70

This case is trivial as the familiar "distance from the mean or median" metric will do eg.这种情况是微不足道的,因为熟悉的“与均值或中值的距离”度量将执行例如。 d(70)=|24-70|=46>10=T1 and d(31)=|24-31|=7<10=T1 d(70)=|24-70|=46>10=T1 和 d(31)=|24-31|=7<10=T1

I can't quite figure out how to deal with the more general case of having 2 or more barycenters.我不太清楚如何处理具有 2 个或更多重心的更一般情况。

Example 2: [20, 20, 21, 31, 24, 20, 70, 21, 31, 24, 120, 120, 121, 131, 124, 120, 120, 121, 131, 124] Now there are two barycenters d1=24 and d2=124 and the only odd one is still 70例二: [20, 20, 21, 31, 24, 20, 70, 21, 31, 24, 120, 120, 121, 131, 124, 120, 120, 121, 131, 124] 现在有两个1ys =24 和 d2=124 唯一的奇数仍然是 70

But the previous metric breaks apart.但是之前的指标会破裂。 Maybe the hard part is to figure out which are the barycenters.也许困难的部分是找出哪些是重心。

Note: I'm looking for a fast algorithm rather than an accurate one注意:我正在寻找一种快速算法而不是准确的算法

It sounds like the general problem you're trying to solve is this: draw as few radius-R circles as possible such that all inputs are covered by at least one circle;听起来您要解决的一般问题是:尽可能少地绘制半径为 R 的圆,以使所有输入至少被一个圆覆盖; then, find circles containing fewer than k inputs.然后,找到包含少于 k 个输入的圆圈。

In your first case, you draw two radius-10 circles: the first contains all inputs except 70, the second contains just 70. Your criterion for detecting abnormal circles should catch the 70-containing one, which should be simple.在第一种情况下,您绘制了两个半径为 10 的圆:第一个包含除 70 之外的所有输入,第二个仅包含 70。检测异常圆的标准应该捕获包含 70 的圆,这应该很简单。 In your second case, you draw three radius-10 circles.在第二种情况下,您绘制了三个半径为 10 的圆。 Again, the criterion that catches the one with 70 only should be easy to state.再次,仅以 70 分捕获的标准应该很容易说明。

If I were going to do this from scratch without looking up what the problem is called (and it's probably a well-known problem with good well-known solutions) I'd start by sorting the inputs, which will probably be very helpful since this is a 1D problem.如果我打算从头开始执行此操作而不查找问题的名称(这可能是一个众所周知的问题,并且具有良好的知名解决方案),我将从对输入进行排序开始,这可能会非常有帮助,因为这是一维问题。 Next, I'd probably run a sliding window of size 2R over the inputs and compute the moving frequency at each potential barycenter (skipping duplicates and jumping gaps), saving this frequency series separately.接下来,我可能会在输入上运行一个大小为 2R 的滑动窗口,并计算每个潜在重心的移动频率(跳过重复项和跳跃间隙),分别保存这个频率序列。 Then, I'd greedily place windows at the locations with the highest frequencies first, in as non-overlapping a fashion as possible, until all inputs get covered.然后,我会首先以尽可能不重叠的方式在频率最高的位置贪婪地放置窗口,直到所有输入都被覆盖。 Then, I'd identify any inputs that were covered by circles with moving frequency less than some cutoff related to the average moving frequency of chosen windows;然后,我会识别出任何被圆圈覆盖的输入,这些圆圈的移动频率小于与所选窗口的平均移动频率相关的某个截止频率; for instance, consider as anomalous all inputs covered by circles which cover half as many inputs, or fewer, compared to the average covered by all circles.例如,与所有圆圈覆盖的平均值相比,将覆盖一半或更少输入的圆圈覆盖的所有输入视为异常。

Example:例子:

INPUT:  20, 20, 21, 31, 24, 20, 70, 21, 31, 24, 20, 20, 21, 31, 24, 20, 20, 21, 31, 24

SORTED: 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 24, 24, 24, 24, 31, 31, 31, 31, 70

WINDOW MOVING FREQUENCY:
20: 15
21: 19
(detects gap, jumps)
60: 1
(detects gap, jumps, ends)

WINDOW #1: [11,31]: 19
WINDOW #2: [50, 70]: 1

AVERAGE: 10
50% AVERAGE: 5
WINDOW #1 OVER CUTOFF
WINDOW #2 UNDER CUTOFF

Example:例子:

INPUT:  20, 20, 21, 31, 24, 20, 70, 21, 31, 24, 120, 120, 121, 131, 124, 120, 120, 121, 131, 124

SORTED: 20, 20, 20, 21, 21, 24, 24, 31, 31, 70, 120, 120, 120, 120, 121, 121, 124, 124, 131, 131

WINDOW MOVING FREQUENCY:
20: 7
(detects gap, jumps)
60: 1
(detects gap, jumps)
110: 4
111: 6
(detects gap, jumps)
114: 8
(detects gap, jumps)
121: 10

WINDOW #1: [111, 131]: 10
WINDOW #2: [10, 30]: 7
WINDOW #3: [50, 70]: 1

AVERAGE: 6
50% AVERAGE: 3

WINDOW #1 ABOVE CUTOFF
WINDOW #2 ABOVE CUTOFF
WINDOW #3 BELOW CUTOFF

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM