简体   繁体   中英

What is the best way to filter list of custom datatype with other list (Iterate over it)

I have a problem regarding filtering a list with my own type with one other.

I came from python so haskell is quite hard for me to understand.

type PersonId = Integer
type PersonName = String
type InfectionId = Integer
type InfectionName = String

data TerminationType = Recovered | Dead
                       deriving (Eq, Show)

data Person = Person PersonId PersonName
              deriving Show

data Infection = Infection InfectionId PersonId InfectionName
                 deriving Show

data Termination = Termination InfectionId TerminationType
                   deriving Show

type Persons = [Person]
type Infections = [Infection]
type Terminations = [Termination]

pers :: Persons
pers = [Person 1 "Augustin"
       ,Person 2 "Baltazar"
       ,Person 42 "Ctirad"
       ,Person 128 "Zdenek"
       ,Person 5 "Drahoslav"
       ]

infs :: Infections
infs = [Infection 2020 1 "COVID"
       ,Infection 2019 42 "COVID"
       ,Infection 1 5 "COVID"
       ,Infection 5 128 "rymicka"
       ,Infection 3 5 "astma"
       ,Infection 2 1 "astma"
       ,Infection 128 5 "zapal plic"
       ]

ters :: Terminations
ters = [Termination 2020 Dead
       ,Termination 2 Recovered
       ,Termination 2019 Recovered
       ,Termination 128 Dead
       ]

I am trying to make a function that would find all active cases. So it would make a list of recorded infections based on current infection and remove all terminations.

My idea was to get all IDs from Terminations using a map and then filtering items with that ID from Infections using something like "in" from python but that solution seems to be way too "OOP".

Something along those lines.

activeCases :: Infections -> Terminations -> [InfectionId]
activeCases infec term = filter (infec.id in (map getId term)) infec
    where getId (Termination y _) = y

The final result with the "pers,infs" and "ters" I have predefined would be

activeCases infs ters ~>* [Infection 1 5 "COVID",Infection 5 128 "rymicka",Infection 3 5 "astma"]

Could someone tell me what is the right way to solve a problem like this?

My idea was to get all IDs from Terminations using a map and then filtering items with that ID from Infections using something like "in" from python but that solution seems to be way too "OOP".

Not at all, you have the right idea, in fact. It looks like you just need some help expressing it.

You can change your code to use the elem function, which is akin to Python's in operator, but works on any foldable container with equatable elements:

elem :: (Foldable t, Eq a) => a -> t a -> Bool

Here you'll be using it with t ~ [] and a ~ InfectionId , so it will have the specialised type elem:: InfectionId -> [InfectionId] -> Bool .

If you want to return just the IDs, not the whole Infection value, you can extract the IDs first with map .

activeCases :: Infections -> Terminations -> [InfectionId]
activeCases infections terminations

  -- Select the IDs that are *not* found in the list of termination IDs.
  = filter (\ i -> not (i `elem` terminationIds))

    -- Extract the ID from each Infection.
    (map infectionId infections)
  where
    terminationIds = map terminationId term
    terminationId (Termination y _) = y
    infectionId (Infection i _ _) = i

Recall that the filter condition indicates which elements to keep , not which ones to filter out , so for the active cases you want to add a not . My mnemonic for this is “a coffee filter lets coffee through” (so filter even lets even values through).

You can simplify this in a few ways, though. First, you can make the data types into records and give each field a name so you can extract it (or update it) more conveniently:

data Person = Person
  { personId :: PersonId
  , personName :: PersonName
  }
  deriving (Show)

data Infection = Infection
  { infectionId :: InfectionId
  , infectionPerson :: PersonId
  , infectionName :: InfectionName
  }
  deriving (Show)

data Termination = Termination
  { terminationInfection :: InfectionId
  , terminationType :: TerminationType
  }
  deriving (Show)

The mapping & filtering can be expressed with a list comprehension, and the not and elem functions can be combined into notElem for convenience. Putting these together:

activeCases :: Infections -> Terminations -> [InfectionId]
activeCases infections terminations =
  [ infectionId infection
  | infection <- infections
  , infectionId infection `notElem` terminationIds
  ]
  where
    terminationIds =
      [ terminationInfection termination
      | termination <- terminations
      ]

You can still use pattern-matching in the comprehension if you prefer:

-- Positional matching:

[ i
| Infection i _ _ <- infections
, i `notElem` terminationIds
]

-- Named matching:

[ i
, Infection { infectionId = i } <- infections
, i `notElem` terminationIds
]

There are higher-level approaches as well. Scanning linearly over the list of Termination IDs for each Infection ID is O(n 2 ), but what you really want here is a set difference , which can be computed more efficiently using map & set data structures such as Data.Map and Data.Set (or Data.IntMap and Data.IntSet ) from containers . For example, if you store your data as a map from the ID to the record for that ID:

import Data.Map (Map)
import Data.Set (Set)

type Persons = Map PersonId Person
type Infections = Map InfectionId Infection
type Terminations = Map InfectionId Termination

infs :: Infections
infs = Map.fromList
  [ (i, infection)
  | infection <-
    [ Infection 2020 1 "COVID"
    , Infection 2019 42 "COVID"
    , Infection 1 5 "COVID"
    , Infection 5 128 "rymicka"
    , Infection 3 5 "astma"
    , Infection 2 1 "astma"
    , Infection 128 5 "zapal plic"
    ]
  , let i = infectionId infection
  ]

Then the set of active cases is the key-set of the infections minus the set of terminated cases, which can be computed in O( n log n ) time:

import qualified Data.Map as Map
import qualified Data.Set as Set

activeCases :: Infections -> Terminations -> Set InfectionId
activeCases infections terminations
  = infectionIds `Set.difference` terminatedInfections
  where
    infectionIds = Map.keysSet infections
    terminatedInfections = Set.fromList
      $ map terminationInfection
      $ Map.elems terminations

Or since a Termination is identified by an InfectionId as well, just the difference of the key sets:

Map.keysSet infections `Set.difference` Map.keysSet terminations

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM