Suppose there are several arrays :
A. [1,2,3,4,5,6,7,8,9,10]
B. [2,4,6,8,10]
C. [1,4,7,10]
D. [1,3,5,7,9]
.
.
I need to find out all possible sets of elements (1,2,3,4,5 ...) each of which is common in at-least 2 arrays (A,B,C....) and show them in following manner:
(2,4,6,8,10) -> (A,B)
(1,4,7,10) -> (A,C)
(1,3,5,7,9) -> (A,D)
(4,10) -> (A,B,C)
(1,7) -> (A,C,D)
The actual inputs are files containing strings. There could be thousands of files and each file could contain more than hundred key string.
I have tried the following approach : First I generated sets of elements by comparing all possible pairs of arrays. Then I tried to generate other sets by using the logic - intersect of set of elements is common in union of set of arrays. Like this:
(2,4,6,8,10) -> (A,B)
(1,4,7,10) -> (A,C)
from above we can get:
intersect((2,4,6,8,10),(1,4,7,10)) -> union((A,B),(A,C))
or, (4,10) -> (A,B,C)
Is there any other approach that I can try to improve time and memory complexity - considering thousand input file containing hundreds of elements each?
I would use the following approach.
Use a hash-map(or a map, if you need to worry about collisions). Pseudo-code below:
for file in file_list:
for word in file:
hash_map[word].append(file)
for wordkey in hash_map:
print pick_uniques(hash_map[wordkey])
This approach has complexity O(total number of words), ignoring the length of each word.
EDIT : Since you also want to combine wordkey
s with the same pick_uniques(hash_map[wordkey])
, you can apply the same hash-map method, this time inverting the keys.
This Java class:
public class Store {
Map<Integer,Set<String>> int2keyset = new HashMap<>();
Set<Set<String>> setOfKeyset = new HashSet<>();
public void enter( String key, Integer[] integers ){
for( Integer val: integers ){
Set<String> keySet = int2keyset.get( val );
Set<String> newKeySet = null;
if( keySet == null ){
newKeySet = new HashSet<String>();
newKeySet.add( key );
} else {
newKeySet = new HashSet<>( keySet );
newKeySet.add( key );
}
setOfKeyset.remove( newKeySet );
setOfKeyset.add( newKeySet );
int2keyset.put( val, newKeySet );
}
}
public void dump(){
Map<Set<String>,Set<Integer>> keySet2intSet = new HashMap<>();
for( Map.Entry<Integer,Set<String>> entry: int2keyset.entrySet() ){
Integer intval = entry.getKey();
Set<String> keySet = entry.getValue();
Set<Integer> intSet = keySet2intSet.get( keySet );
if( intSet == null ){
intSet = new HashSet<Integer>();
}
intSet.add( intval );
keySet2intSet.put( keySet,intSet );
}
for( Map.Entry<Set<String>,Set<Integer>> entry: keySet2intSet.entrySet() ){
System.out.println( entry.getValue() + " => " + entry.getKey() );
}
}
}
when fed with the lines given in the question produces:
[2, 6, 8] => [A, B]
[3, 5, 9] => [A, D]
[4, 10] => [A, B, C]
[1, 7] => [A, C, D]
Although it is not identical to the expected output, it does contain all the information to produce that, and is much more compact. If a large number of input lines is to be expected, it might be worth pursuing a way that keeps the stored information as compact as possible, and I've tried to follow this guideline.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.