简体   繁体   中英

Scala: HashSet Intersection

I have a number of 2000+ lines of (ID1 ID2), which separated by blank space each line from a text file. The size for both ID1&2 is 100.

  //My load file codes 
def loadFile(file:Iterator[String]):Set[(Int,Int)] = {

        val z1 = file.map(line =>line.split(" ") match {
           case Array(id1,id2)=>(id1.toInt,id2.toInt)
        }).toSet
        z1

  }

I load them as Set(Tuples(ID1, ID2)) . As my set is more than 4 elements, it is indicated as a HashSet here.

myData = HashSet((15,88), (56,66), (92,68), (27,4), (84,14), (88,17), (6,47), (97,45), (96,41), (21,66), (65,10), (44,66), (2,9), (86,61),...)

The target of my code is to find which ID2 has involved/contains/intersected in EVERY ID1 . Finally, print out the ID2 , ie (ID1_1,ID2_EVERYONE),(ID1_2,ID2_EVERYONE), (ID1_3,ID2_EVERYONE), (ID1_4,ID_EVERYONE), ... ,(ID1_100,ID2_EVERYONE)

print ID_EVERYONE

UPDATED Here to give a smaller group size of my example, for size= 5,

(1,5), (2,5), (3,5),(4,5),(5,5), 
(1,4), (2,4), (3,4),(4,4),(5,4), 
(5,4), (4,5)

For the condition: ID_1(from 1 to 5) they all recognize 4 and 5. And ID_1= 4&5 only allows knowing each other. Finally, print out ID 4&5.

Updated V2

(37,52), (37,37), (37,45), (37,14)
(52,37), (52,52), (52,45), (52,14)

(14,20), (14,14), (14,12), (14,4), (14,49), (14,91), (14,45), (14,54), (14,52), (14,37)

(45,45), (45,52), (45,14), (45,37)

From @jwvh's codes:

myData.groupMap(_. 1)( ._2).values.reduce( _ intersect _)

It helps to produce the data above. However, observe that ID14 is knowing extra ID other than ID 37,52,45 and itself . Hence, ID14 should be filtered out as well.

If I understood correctly you wish to find the subset of ID2 which is "known" by all ID1 members but does not recognize any other ID2 member as an ID1 member can u try something like that

    val id1Pointers : Map[Int,Set[Int]] = myData.groupBy(_._1).mapValues(_.map(_._2).toSet)
    val id2Optionals: Set[Int] = id1Pointers.values.reduce( _ intersect _)
    val finalId2 = id2Optionals.filter(id2 =>  (id1Pointers.getOrElse(id2,Set[Int]()).subsetOf(id2Optionals)))
    println(finalId2.mkString("&"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM