简体   繁体   中英

Select most common domain from list of email addresses

I have a function that generates a random email address:

def emails
    names = ["alfred", "daniel", "elisa", "ana", "ramzes"]
    surnames = ["oak", "leaf", "grass", "fruit"]
    providers = ["gmail", "yahoo", "outlook", "icloud"]
    address = "#{names.sample}.#{surnames.sample}#{rand(100..5300)}@#{providers.sample}.com"
end

Given a list of randomly generated email address:

email_list = 100.times.map { emails }

that looks like this:

daniel.oak3985@icloud.com
ramzes.grass1166@icloud.com
daniel.fruit992@yahoo.com
...

how can I select the most common provider ("gmail", "yahoo", etc.)?

Your question is similar to this one . There's a twist though : you don't want to analyze the frequency of email addresses, but their providers.

def random_email
  names = ["alfred", "daniel", "elisa", "ana", "ramzes"]
  surnames = ["oak", "leaf", "grass", "fruit"]
  providers = ["gmail", "yahoo", "outlook", "icloud"]
  address = "#{names.sample}.#{surnames.sample}#{rand(100..5300)}@#{providers.sample}.com"
end

emails = Array.new(100){ random_email }

freq = emails.each_with_object(Hash.new(0)) do |email,freq|
  provider = email.split('@').last
  freq[provider] += 1
end

p freq
#=> {"outlook.com"=>24, "yahoo.com"=>28, "gmail.com"=>32, "icloud.com"=>16}

p freq.max_by{|provider, count| count}.first
#=> "gmail.com"
email_list = 10.times.map { emails }
  #=> ["alfred.grass426@gmail.com", "elisa.oak239@icloud.com",
  #    "daniel.fruit1600@outlook.com", "ana.fruit3761@icloud.com",
  #    "daniel.grass742@yahoo.com", "elisa.oak3891@outlook.com",
  #    "alfred.leaf1321@gmail.com", "alfred.grass5295@outlook.com",
  #    "ramzes.fruit435@gmail.com", "ana.fruit4233@yahoo.com"] 

email_list.group_by { |s| s[/@\K.+/] }.max_by { |_,v| v.size }.first
  #=> "gmail.com"

\\K in the regex means disregard everything matched so far. Alternatively, @\\K could be replaced by the positive lookbehind (?<=@) .

The steps are as follows.

h = email_list.group_by { |s| s[/@\K.+/] }
  #=> {"gmail.com"  =>["alfred.grass426@gmail.com", "alfred.leaf1321@gmail.com",
  #                    "ramzes.fruit435@gmail.com"],
  #    "icloud.com" =>["elisa.oak239@icloud.com", "ana.fruit3761@icloud.com"],
  #    "outlook.com"=>["daniel.fruit1600@outlook.com",  "elisa.oak3891@outlook.com",
  #                    "alfred.grass5295@outlook.com"],
  #    "yahoo.com"  =>["daniel.grass742@yahoo.com", "ana.fruit4233@yahoo.com"]}
a = h.max_by { |_,v| v.size }
  #=> ["gmail.com", ["alfred.grass426@gmail.com", "alfred.leaf1321@gmail.com",
  #                  "ramzes.fruit435@gmail.com"]] 
a.first
  #=> "gmail.com" 

If, as here, there is a tie for most frequent, modify the code as follows to get all winners.

h = email_list.group_by { |s| s[/@\K.+/] }
  # (same as above)
mx_size = h.map { |_,v| v.size }.max
  #=> 3 
h.select { |_,v| v.size == mx_size }.keys
  #=> ["gmail.com", "outlook.com"] 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM