简体   繁体   中英

Indexing the (unordered) pairs of a set

This is an auto-answered question, originating from this more specific question where the OP seems to have lost interest after selecting a wrong (IMHO) answer.

I did check previous questions on the subject, but none seemed to tackle the problem.

What use is that?

Imagine you have 4 people: Abdul, Beatrix, Charlie and Daria.
You want to store informations about the way these persons feel toward each other

Abdul and Beatrix are in love
Beatrix and Charlie hate each other
Abdul and Charlie are good friends
Daria and Beatrix don't know each other
etc.

In the terse and devoid of poetry world of computers, that could translate to:

relation (Abdul  , Beatrix) = love;
relation (Beatrix, Charlie) = hate;
relation (Abdul  , Charlie) = friendship;
etc.

In other words, if you want to map the relations between each pair of people, you will need a data structure that allows you to maintain a unique value for each pair of people.

Although there are dozens of ways of implementing a suitable data structure, you might want in some case this table to be a fixed-size array directly indexed by the pairs representing a given relation.

Some definitions:

given I N the set of the first N natural integers, let's call P N the sequence of all unordered pairs (a,b) of I N such that a <> b, sorted in lexicographic order.

In (hopefully) less cryptic English, P enumerates all the possible relations between two elements of I .

example (for N = 4):

I 4 = (0,1,2,3)
P 4 = ((0,1),(0,2),(0,3),(1,2),(1,3),(2,3))

Note that the cardinality of P N is N(N-1)/2, so
the most compact zero-based index of P N will be in the [0..N(N-1)/2-1] range .

Question:

how can we index P N in a compact and efficient way?

In other therms,

  • define a function p N (a,b) that, given a pair (a,b) of elements of I N , produces a unique index of P N in the range [0..N(N-1)/2-1]
  • define the reverse-indexing function p N -1 that, given an index of P N , will produce the corresponding (a,b) pair

The way P N is arranged is of lesser importance, but a lexicographic order would probably be the most convenient.

example:

P 4 = ((0,1),(0,2),(0,3),(1,2),(1,3),(2,3))
p 4 (1,3) = 4
p 4 -1 (4) = (1,3)

This seems to be more of a math question.

If my calculations are correct then,

For a pair P (a,b), the number of pairs of type (a,x) [x < b] before P shall be b-a-1.

The number of pairs of type (x,y) [x < a] = (n-1)+(n-2)+(n-3)...+(n-a) = a*n - a(a+1)/2

Hence total number of pairs before P = (b-a-1) + a*n - a(a+1)/2.

Hence index of P = (ba-1) + a*n - a(a+1)/2.

For reverse indexing, first find a, because we know that for 1st n-1 terms, a = 0, for next n-2 terms, a = 1, etc.

This can be done in O(N) time by iterating over these values and seeing when it exceeds the index.

Once we find a, then b can be found from the equation above.

Both answers I see here so far do the first calculation fine, but the backward calculation requires looping, which is not necessary.

Consider the following example with n=5 , showing how the elements are numbered.

    0   1   2   3   4
  +---+---+---+---+---+
0 |   |   |   |   |   |
  +---+---+---+---+---+
1 | 0 |   |   |   |   |
  +---+---+---+---+---+
2 | 1 | 4 |   |   |   |
  +---+---+---+---+---+
3 | 2 | 5 | 7 |   |   |
  +---+---+---+---+---+
4 | 3 | 6 | 8 | 9 |   |
  +---+---+---+---+---+

Given a tuple (x, y) (assuming x < y ), the first index in column x is given by

n-1 + n-2 + ... + n-x = (n-1 + n-x) * x / 2 = (2n - x - 1) * x / 2

The offset in that column is simply y - x - 1 . This yields the total expression

p_n(x, y) = (2n - x - 1) * x / 2 + y-x-1 = (2n - x - 3) * x / 2 + y-1

Now, going the other way around is tricky. We have some values p and n and need to find x and y . We can make our life simpler though by assuming we're looking for the first cell in the column, ie y = x+1 . If we plug this in in the formula above, we obtain

p = (2n - x - 1) * x / 2

Rewriting this formula yields

x^2 - (2n-1) * x + 2p = 0

which is a simple quadratic equation and can be solved for x:

x = [(2n-1) - Sqrt((2n-1)^2 - 8p)] / 2

Of course, we likely overestimated x , because we assumed the lowest possible value for y . However, we are not that far off (still in the right column), so rounding down the value is enough to get the real x .

Plugging the x value we found into the original formula yields a very easy equation for y :

x = Floor( [(2n-1) - Sqrt((2n-1)^2 - 8p)] / 2 )
y = p - (2n - x - 3) * x / 2 + 1

It can be argued that taking the square root of a number is a slow operation (which is true), but this approach will outperform a loop for bigger values of n .

index computation

let a and b be two elements of I N with a < b

If we represent P as a half-filled matrix, the idea is to add an offset to the start of each row to get a contiguous numbering at the start of the next row.

Row 0 starts with an offset of 0 and contains N-1 values
Row 1 starts at N-1 and contains N-2 values
etc..

The offset value of Row a will be the sum of Ni for i in (1..a).

The final pair index will be offset(a)+b .

computation of offset(a) is done by using the formula that gives the sum of the first n integers: s(n) = n(n-1)/2.

Here offset(a) will be = s(N) - s(a).

After a bit of maths, the resulting formula can be written as:

p N (a,b) = a(2N-a-3)/2 + b - 1

The pseudo-code for P is then:

function p (a,b)
{
    if (a>b) swap(a,b)
    return a * (2 * N - a - 3) / 2 + b - 1
}

reverse indexing

All credit goes to Heuster for finding an elegant solution to the problem.
See the selected answer for details.

Here is the pseudo-code:

const N1 = 2 * N - 1
const N2 = N1 * N1
function reverse_p (p)
{
    a = floor( (N1 - sqrt(N2 - 8 * p)) / 2)
    b = p - (2 * N - a - 3) * a / 2 + 1
    return (a, b)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM