简体   繁体   中英

Array of set-of-integers in MATLAB

I have n different unique unsigned 32-bit integers. For each integer in for-loop I will generate a random index (address) and I want to put that number is a set specified by that index (address). For example in for-loop I reach to number 7 and my code produces 13 as address, then I need to add the number 7 to the 13th set. Having index (address) range of 1 to k, I will need k different sets. Currently I am using "cell" data structure in MATLAB.

array_of_sets=cell(n,1);

and when I want to add new member to ith set, I will index by array_of_sets{i} and then I will concatenate my new number.

My problem is that this approach is not memory efficient nor time efficient. Can anyone please guide me to a more efficient way to do this.

This is a simplified version my code so far:

array_of_sets=cell(k,1);
for i=1:n 
   address=something_genrated_randomly;
   array_of_sets{address}=[array_of_sets{address},uint32(i)]; %Add to the corresponding set(One specific Cell)
end

Output: Given the index ind, outputs the the ind^th set of integers.

Basically what I am looking for is similar to ArrayList<Set<Integer>> from Java but in MATLAB.

Cells requires some extra memory to be able to store the information you need. This seems to be 112 bytes for a full cell independently matrix size and data type. For an empty cell 8 bytes is allocated, which is the same as a 64-bit pointer (one can assume this makes the header of each cell 112-8=104 bytes). This means that you will experience a huge waste of memory in case your arrays are short. In case you can guarantee that all the cells are smaller than 28 elements (112/4), you will get away cheaper with a 3-dimensional vector for every situation (zero-padded). However in case some cells are overly represented you will probably save memory by using cells. Also, as others have stated, cells are slow as well. So that may be something to take into account if execution time is a problem. Further in case the RAM is getting full you will start swapping and this will slow execution tremendously. If this is the problem, getting the memory down should be the main problem.

a = cell(1,2);
a{1} = zeros(100);
b=zeros(100);
c = cell(1,2);
d = cell(10,10);
e = cell(10,10);
d(:,:) = {zeros(10)};
e(:,:) = {zeros(10,10,10)};
f = zeros(100,1000);
whos
Name        Size               Bytes  Class     Attributes

a           1x2                80120  cell                
b         100x100              80000  double              
c           1x2                   16  cell                
d          10x10               91200  cell                
e          10x10              811200  cell                
f         100x1000            800000  double   

EDIT

One way to create a non repeating set of integers is this way when using a zero padded array. Assume that you want 100 sets of about 1000 values each. Also in case you assume the distribution of random value is uniform they will probably be fairly evenly distributed. Assume a variation of an average of 10 values per set. This means that 10*number_of_sets values will be zero.

maxVal = 100000;
nSets = 100;
nonAssigned = 1000;
sets = randperm(maxVal);
sets(sets > (maxVal-nonAssigned)) = 0;
sets = reshape(vec,100,1000);

This will set all the "empty" values to zero and in this case we have about 1% memory overhead. This is still better than using an HashSet or TreeSet taking up 32*SIZE bits. In case this 1% is too much you will probably have to iterate and generate a few of the sets iteratively. In case you need this amount of data I do not think you have many options. In case this causes memory trouble you may want to look into if writing the code in c and mex the function will solve your problems.

I think your question boils down to this:

Question: How can I randomly assign a set of n integers into k subsets efficiently?

If that's indeed what you're asking, here's a simple solution that only requires storage of 2*n values in memory, avoiding cell and zero-padded arrays, and uses logical indexing for speed.

Solution: Assign a vector of random "addresses", then use logical indexing whenever you need to refer to a specific set:

%// for example:
n = 10;
k = 4;

%// 1-by-n vector of integers, for example:
ints = 1:n;

%// 1-by-n vector of random "addresses" between 1 and k
address = randi([1,k],[1,n]);

Now you can access any subset using logical indexing:

set1 = ints(address==1);
set2 = ints(address==2);
...
setk = ints(address==k);

Memory-wise, you'll need to store 2*n values: the integers and their addresses. If the subsets are all approximately the same size, then using a zero-padded array may be cheaper.

Example:

>> [ints;address]

ans =

 1     2     3     4     5     6     7     8     9    10
 4     4     1     4     3     1     2     3     4     4

>> set1=ints(address==1)
>> set2=ints(address==2)
>> set3=ints(address==3)
>> set4=ints(address==4)

set1 =

 3     6


set2 =

 7


set3 =

 5     8


set4 =

 1     2     4     9    10

You could also use sortrows to re-order the integers and addresses:

>> sortrows([ints;address]',2)'

ans =

 3     6     7     5     8     1     2     4     9    10
 1     1     2     3     3     4     4     4     4     4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM