简体   繁体   English

Ruby/Rails 解析电子邮件

[英]Ruby/Rails Parsing Emails

I'm currently using the following to parse emails:我目前正在使用以下内容来解析电子邮件:

  def parse_emails(emails)
    valid_emails, invalid_emails = [], []
    unless emails.nil?
      emails.split(/, ?/).each do |full_email|
        unless full_email.blank?
          if full_email.index(/\<.+\>/)
            email = full_email.match(/\<.*\>/)[0].gsub(/[\<\>]/, "").strip
          else
            email = full_email.strip
          end
          email = email.delete("<").delete(">")
          email_address = EmailVeracity::Address.new(email)
          if email_address.valid?
            valid_emails << email 
          else
            invalid_emails << email
          end
        end
      end                    
    end
    return valid_emails, invalid_emails
  end

The problem I'm having is given an email like:我遇到的问题收到了一封电子邮件,如:

Bob Smith <bob@smith.com>

The code above is delete Bob Smith and only returning bob@smith.上面的代码是删除 Bob Smith,只返回 bob@smith。

But what I want is an hash of FNAME, LNAME, EMAIL.但我想要的是 FNAME、LNAME、EMAIL 的哈希。 Where fname and lname are optional but email is not.其中 fname 和 lname 是可选的,但 email 不是。

What type of ruby object would I use for that and how would I create such a record in the code above?我将为此使用什么类型的 ruby​​ 对象,我将如何在上面的代码中创建这样的记录?

Thanks谢谢

I've coded so that it will work even if you have an entry like: John Bob Smith Doe <bob@smith.com>我已经进行了编码,即使您有以下条目,它也能正常工作: John Bob Smith Doe <bob@smith.com>

It would retrieve:它会检索:

{:email => "bob@smith.com", :fname => "John", :lname => "Bob Smith Doe" }

def parse_emails(emails)
  valid_emails, invalid_emails = [], []
  unless emails.nil?
    emails.split(/, ?/).each do |full_email|
      unless full_email.blank?
        if index = full_email.index(/\<.+\>/)
          email = full_email.match(/\<.*\>/)[0].gsub(/[\<\>]/, "").strip
          name  = full_email[0..index-1].split(" ")
          fname = name.first
          lname = name[1..name.size] * " "
        else
          email = full_email.strip
          #your choice, what the string could be... only mail, only name?
        end
        email = email.delete("<").delete(">")
        email_address = EmailVeracity::Address.new(email)

        if email_address.valid?
          valid_emails << { :email => email, :lname => lname, :fname => fname} 
        else
          invalid_emails << { :email => email, :lname => lname, :fname => fname}
        end
      end
    end                    
  end
  return valid_emails, invalid_emails 
end

Here's a slightly different approach that works better for me.这是一种稍微不同的方法,对我来说效果更好。 It grabs the name whether it is before or after the email address and whether or not the email address is in angle brackets.无论是在电子邮件地址之前还是之后,以及电子邮件地址是否在尖括号中,它都会抓取名称。

I don't try to parse the first name out from the last name -- too problematic (eg "Mary Ann Smith" or Dr. Mary Smith"), but I do eliminate duplicate email addresses.我不会尝试从姓氏中解析出名字——这太成问题了(例如“Mary Ann Smith”或“Mary Smith 博士”),但我确实消除了重复的电子邮件地址。

def parse_list(list)
  r = Regexp.new('[a-z0-9\.\_\%\+\-]+@[a-z0-9\.\-]+\.[a-z]{2,4}', true)
  valid_items, invalid_items = {}, []

  ## split the list on commas and/or newlines
  list_items = list.split(/[,\n]+/)

  list_items.each do |item|
    if m = r.match(item)
      ## get the email address
      email = m[0]
      ## get everything before the email address
      before_str = item[0, m.begin(0)]
      ## get everything after the email address
      after_str = item[m.end(0), item.length]
      ## enter the email as a valid_items hash key (eliminating dups)
      ## make the value of that key anything before the email if it contains
      ## any alphnumerics, stripping out any angle brackets
      ## and leading/trailing space   
      if /\w/ =~ before_str
        valid_items[email] = before_str.gsub(/[\<\>\"]+/, '').strip
      ## if nothing before the email, make the value of that key anything after
      ##the email, stripping out any angle brackets and leading/trailing space 
      elsif /\w/ =~ after_str
        valid_items[email] = after_str.gsub(/[\<\>\"]+/, '').strip
      ## if nothing after the email either,
      ## make the value of that key an empty string
      else
        valid_items[email] = ''
      end
    else
      invalid_items << item.strip if item.strip.length > 0
    end
  end

  [valid_items, invalid_items]
end

It returns a hash with valid email addresses as keys and the associated names as values.它返回一个哈希,以有效的电子邮件地址作为键,关联的名称作为值。 Any invalid items are returned in the invalid_items array.在 invalid_items 数组中返回任何无效项目。

See http://www.regular-expressions.info/email.html for an interesting discussion of email regexes.有关电子邮件正则表达式的有趣讨论,请参见http://www.regular-expressions.info/email.html

I made a little gem out of this in case it might be useful to someone at https://github.com/victorgrey/email_addresses_parser我做了一个小宝石,以防它对https://github.com/victorgrey/email_addresses_parser上的某人有用

You can use rfc822 gem.您可以使用rfc822 gem。 It contains regular expression for seeking for emails that conform with RFC.它包含用于查找符合 RFC 的电子邮件的正则表达式。 You can easily extend it with parts for finding first and last name.您可以使用用于查找名字和姓氏的部件轻松扩展它。

沿着 mspanc 的回答,您可以使用mail gem 为您完成基本的电子邮件地址解析工作,如下所述: https : //stackoverflow.com/a/12187502/1019504

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM