從文本文件中提取電子郵件地址和名稱

Question

我將盡力解釋這個問題。 我有一個包含電子郵件地址和名稱的文本文件。 看起來像這樣： Barb Beney "de.mariof@vienna.aa", "Beny Beney" bet@catering.at等，都在同一行中。 這只是一個例子，我在一個大文本文件中擁有數千個這樣的數據。 我想提取電子郵件和姓名，以便最終得到如下信息：

Beny Beney bet@catering.at-分開，一行一行，沒有引號。 最后，它應該從文件中刪除所有重復的地址。

我寫了提取電子郵件地址的代碼，它可以正常工作，但是我不知道如何做剩下的事情。 如何提取名稱將其作為地址放在一行中，並消除重復項。 我希望我能正確地描述它，以便您知道我要做什么。 這是我的代碼：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using System.IO;

namespace Email
{
class Program
{
    static void Main(string[] args)
    {
        ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");   
    }


    public static void ExtractEmails(string inFilePath, string outFilePath)
    {
        string data = File.ReadAllText(inFilePath);

        Regex emailRegex = new Regex(@"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
            RegexOptions.IgnoreCase);


        MatchCollection emailMatches = emailRegex.Matches(data);


        StringBuilder sb = new StringBuilder();

        foreach (Match emailMatch in emailMatches)
        {
            sb.AppendLine(emailMatch.Value);

        }

        File.WriteAllText(outFilePath, sb.ToString());
    }

}}

Answer 1

歡迎使用此代碼，它將適用於通過創建新文件創建的文件，該文件將包含所有重復的電子郵件：

    static void Main(string[] args)
    {
        TextWriter w = File.CreateText(@"C:\Users\drake\Desktop\NonDuplicateEmails.txt");
        ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");
        TextReader r = File.OpenText(@"C:\Users\drake\Desktop\Email.txt");
        RemovingAllDupes(r, w);
    }

    public static void RemovingAllDupes(TextReader reader, TextWriter writer)
    {
        string currentLine;
        HashSet<string> previousLines = new HashSet<string>();

        while ((currentLine = reader.ReadLine()) != null)
        {
            // Add returns true if it was actually added,
            // false if it was already there
            if (previousLines.Add(currentLine))
            {
                writer.WriteLine(currentLine);
            }
        }
        writer.Close();
    }

Answer 2

對於新的所需格式，您可以執行以下操作：

private string[] parseEmails(string bigStringiIn){

string[] output;
string bigString;

bigString = bigStringiIn.Replace("\"", "");

output = bigString.Slit(",".ToCharArray());

return output;
}

它使用帶有郵件地址的字符串，替換引號，然后將字符串拆分為以下格式的字符串數組： name lastname email@some.com

對於重復的條目刪除，嵌套的for應該可以解決問題，檢查（也許在.Split（）之后）是否匹配字符串。

Answer 3

您還可以對大文件使用以下代碼：

    static void Main(string[] args)
    {
        ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");
        var sr = new StreamReader(File.OpenRead(@"C:\Users\drake\Desktop\Email.txt"));
        var sw = new StreamWriter(File.OpenWrite(@"C:\Users\drake\Desktop\NonDuplicateEmails.txt"));
        RemovingAllDupes(sr, sw);
    }

    public static void RemovingAllDupes(StreamReader str, StreamWriter stw)
    {

        var lines = new HashSet<int>();
        while (!str.EndOfStream)
        {
            string line = str.ReadLine();
            int hc = line.GetHashCode();
            if (lines.Contains(hc))
                continue;

            lines.Add(hc);
            stw.WriteLine(line);
        }
        stw.Flush();
        stw.Close();
        str.Close();

從文本文件中提取電子郵件地址和名稱

問題描述

3 個解決方案

解決方案1
0 2015-02-03 14:25:37

解決方案2
0 2015-02-03 15:16:27

解決方案3
0 2015-02-03 22:24:17

從文本文件中提取電子郵件地址和名稱

問題描述

3 個解決方案

解決方案1 0 2015-02-03 14:25:37

解決方案2 0 2015-02-03 15:16:27

解決方案3 0 2015-02-03 22:24:17

解決方案1
0 2015-02-03 14:25:37

解決方案2
0 2015-02-03 15:16:27

解決方案3
0 2015-02-03 22:24:17