简体   繁体   English

将CSV文件导入c#

[英]Import CSV file into c#

I'm building a website and one of the requirements is for users to export their contacts from their email client to import them into the site. 我正在建立一个网站,其中一个要求是用户从他们的电子邮件客户端导出他们的联系人以将其导入网站。

Because each email client exports their contacts in a slightly different format this has got my head scratching has to the best way to approach it. 因为每个电子邮件客户端以稍微不同的格式导出他们的联系人,所以我的头脑不得不以最好的方式来接近它。 As I don't know what the fields are, or what the delimiter is. 因为我不知道这些字段是什么,或者分隔符是什么。

I'm only looking to target the main email client/address books (outlook, apple mail, entourage, thunderbird). 我只想找到主要的电子邮件客户端/地址簿(outlook,apple mail,entourage,thunderbird)。 All of these have an entirely different format. 所有这些都有完全不同的格式。 Entourage uses tab as a delimiter where as the rest use a comma etc. I only need to pluck out the email address and (if available) a name. Entourage使用tab作为分隔符,其余的使用逗号等。我只需要拔出电子邮件地址和(如果可用)名称。 The name gets trickier as some clients have separate fields for first name / last name. 名称变得棘手,因为一些客户端具有名字/姓氏的单独字段。

Using FileHelpers would be ideal, but it seems I need to know the structure of the csv before I can hook up a solution. 使用FileHelper是理想的,但在我可以连接解决方​​案之前,我似乎需要知道csv的结构。 I'd rather not go writing my own csv parser if possible. 如果可能的话,我宁愿不去编写我自己的csv解析器。

Here's my thoughts for the collective hive mind: 这是我对集体蜂巢思想的看法:

Plan A 计划A.

  • Read the first line of the csv file (all of the formats have a heading as the first row) and count the number of tabs vs commas. 阅读csv文件的第一行(所有格式都有一个标题作为第一行)并计算标签与逗号的数量。 Determine from this the delimiter. 由此确定分隔符。
  • Use some type of csv reader such as Lumenworks to give me basic csv reading capabilities for the rest of the file. 使用某种类型的csv阅读器(如Lumenworks)为我提供文件其余部分的基本csv读取功能。
  • Perform a Regex match on each field to determine the email column. 在每个字段上执行正则表达式匹配以确定电子邮件列。
  • No idea on how to figure out the name of the user... 不知道如何找出用户的名字......

Plan B B计划

  • Prompt user for type of email client, and individually code it up for each different client <- seems really clunky. 提示用户输入电子邮件客户端的类型,并为每个不同的客户端单独编码< - 看起来真的很笨重。

Plan C 计划C.

....Use / purchase an existing component that already does this?! ....使用/购买已经执行此操作的现有组件?! (I sure can't find one!!) (我肯定找不到一个!)

Thoughts? 思考?

I would go with Plan B (and I disagree that it is clunky). 我会选择B计划(我不同意它是笨重的)。

IMHO, the best way would be to ask the user what kind of email client he/she needs to export from. 恕我直言,最好的方法是询问用户他/她需要从哪个电子邮件客户端导出。 Accordingly, you can identify the separator character. 因此,您可以识别分隔符。 You yourself have found that although different clients use different separators, a single client will always use the same separator (unless they decide to bring out a non-backward compatible version) Consequently, tt should not be difficult to create an object-oriented class that accepts the separator as a parameter and accordingly parses input (the logic should remain almost the same, irrespective of the separator). 你自己发现虽然不同的客户端使用不同的分隔符,但是单个客户端总是使用相同的分隔符(除非他们决定引出一个非向后兼容的版本)因此,创建一个面向对象的类应该不难接受分隔符作为参数,并相应地解析输入(逻辑应保持几乎相同,不管分隔符如何)。

Even if the logic in parsing each type of export file differs significantly, it seems to be that you could create an abstract base class that holds all the common functionality and derived classes that simply override the client-specific functionality. 即使解析每种类型的导出文件的逻辑显着不同,似乎您可以创建一个抽象基类,其中包含所有常用功能和派生类,它们只是覆盖客户端特定的功能。

Even if you use a custom library such as FileHelpers, you should be able to accomplish it by passing the type of separator. 即使您使用FileHelpers等自定义库,您也应该能够通过传递分隔符的类型来完成它。

I feel that you should not rely on the relative count of the possible separators to identify what the actual separator is (as in Plan A). 我觉得你不应该依赖可能的分隔符的相对计数来识别实际的分隔符是什么(如在计划A中)。

Edit: Another option that just came to mind would be to provide a sort of options interface like MS Excel does. 编辑:刚想到的另一个选择是提供一种像MS Excel那样的选项界面。 You get to choose the separator character with a live preview of how data will be parsed according to the choice. 您可以选择分隔符,并根据选择实时预览数据的解析方式。

I would first look at how the competition does it. 我首先看看比赛是如何做到的。

Google: "We support importing contacts in the CSV file format (Comma Separated Values). For best results, please use a CSV file produced by Outlook, Outlook Express, Yahoo!, or Hotmail. For Apple Address Book, there is a useful utility called "A to G"." 谷歌: “我们支持以CSV文件格式导入联系人(逗号分隔值)。为了获得最佳效果,请使用由Outlook,Outlook Express,Yahoo!或Hotmail生成的CSV文件。对于Apple地址簿,有一个有用的实用程序称为“A到G”。“
So I guess they go for your plan A, and have checks in place for the above stated CSV files. 所以我猜他们会选择你的计划A,并对上述CSV文件进行检查。

Live mail/hotmail: They go for your option B, and support: Microsoft Outlook (using CSV), Outlook Express (using CSV), Windows Contacts, Windows Live Hotmail, Yahoo! 实时邮件/ hotmail:他们选择B和支持: Microsoft Outlook(使用CSV),Outlook Express(使用CSV),Windows联系人,Windows Live Hotmail,Yahoo! Mail (using Outlook CSV format and comma separated), Gmail (using Outlook CSV format) 邮件(使用Outlook CSV格式和逗号分隔),Gmail(使用Outlook CSV格式)

Facebook: They let you type in your email address, and if they know it (yahoo, gmail, hotmail etc) they will ask you for your password, and retrieve your contacts automatically. Facebook:他们允许您输入您的电子邮件地址,如果他们知道(yahoo,gmail,hotmail等),他们会询问您的密码,并自动检索您的联系人。 (option D) If they don't support your email provider you can still upload a CSV file from either Outlook or other formats (kind of your option B). (选项D)如果他们不支持您的电子邮件提供商,您仍然可以从Outlook或其他格式上传CSV文件(选项B的种类)。

I guess the facebook version is really cool. 我想facebook的版本真的很酷。 But if that is too much you can go for option A for supported CSV formats (you have to check the different CSV files), and otherwise if you don't recognize it, prompt the user for meaning of the different columsn you recognized. 但如果这太多了,您可以选择支持的CSV格式的选项A(您必须检查不同的CSV文件),否则如果您不识别它,请提示用户您识别的不同columsn的含义。

Here is some code to use if you need to change the delimiter of a CSV file that will be imported: 如果您需要更改将导入的CSV文件的分隔符,请使用以下代码:

GenericConnection connection = new GenericConnection();
OleDbConnection con = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +

file.DirectoryName + "\"; Extended Properties='text;HDR=Yes;FMT=" + strDelimiter + "(,)';");
connection.DBConn = con;
connection.Filename = strFilePath;

FileInfo file = new FileInfo(conn.Filename);

DataTable dt = new DataTable();

string selectFields = "Name, email";

using (OleDbCommand cmd = new OleDbCommand(string.Format("SELECT {0} FROM [{1}]", selectFields, file.Name), (OleDbConnection)conn.DBConn))
{
    conn.DBConn.Open();
    using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
    {
       adp.Fill(dt);
    }
}

It might make sense to create an interface like "IContactImporter" which has a method "Import(File/whatever ...)". 创建一个像“IContactImporter”这样的接口可能是有意义的,它有一个方法“Import(File / whatever ...)”。 Then for each type of contact file, create classes that implement the import method to handle each format. 然后,对于每种类型的联系人文件,创建实现导入方法的类来处理每种格式。

If there is some way to tell which type of file the user is uploading, you may not need to ask the user. 如果有某种方法可以告诉用户正在上传哪种类型的文件,您可能不需要询问用户。

For the actual implementations, I would find an existing CSV library, and configure it accordingly for each format. 对于实际的实现,我会找到一个现有的CSV库,并根据每种格式对其进行相应的配置。 Someone at my work used LINQtoCSV , but I'm not sure if there are better options. 我工作的人使用LINQtoCSV ,但我不确定是否有更好的选择。

Plan B would be best, another way would be to look at the whole file and count occurrences of a character this can be done line by line with the streamreader class, then you can split the resulting string into an array. B计划是最好的,另一种方法是查看整个文件并计算一个字符的出现次数,这可以与streamreader类一起逐行完成,然后您可以将结果字符串拆分为一个数组。

youll need to restrict the characters to not alpha numeric Az 0-9 " and look at the char 你需要将字符限制为不是字母数字Az 0-9“并查看字符

then you can determine the delimiter. 然后你可以确定分隔符。 also be aware that if a field is null some programs dont export the "cell", ms office 2007 for instance 另外请注意,如果某个字段为空,某些程序不会导出“cell”,例如ms office 2007

Plan A seems sensible. 计划A似乎是明智的。 I wouldn't think that there would be too many field names (if any) with commas or tabs. 我不认为会有太多字段名称(如果有的话)用逗号或制表符。 So the statistic would be accurate 90% of the time. 因此,统计数据在90%的时间内都是准确的。 If the statistic is 'close' enough (eg 15 commas and 12 tabs) what you could do is: 如果统计数据足够“接近”(例如15个逗号和12个标签),您可以做的是:

int i = line.IndexOf("email", StringCompareOptions.CultureInvariantIgnoreCase);
if(i == -1) i = line.IndexOf("e-mail", StringCompareOptions.CultureInvariantIgnoreCase);
else i += 5; // Length of "email"
if(i == -1) throw new Exception("You should select the email field when exporting.");
else i += 6; // Length of "e-mail"

// Find the next delimeter.
string delim = null;
for(int k = i; k < line.Count; k++)
{
    char c = line[k];
    if(c == '\t' || c == ',')
    {
       delim = c.ToString();
       break;
    }
}

if(delim == null)
   throw new Exception("Unrecognised file format.");

On top of that you said that there would be problem with the first name and last name fields - as well as things like email and e-mail. 最重要的是,你说第一个名字和姓氏字段存在问题 - 以及电子邮件和电子邮件等问题。 You would need a pretty good design pattern here. 你需要一个非常好的设计模式。 In the true interests of normalized data I would store the first name and last name (and combine them in the UI). 为了规范化数据的真正利益,我将存储名字和姓氏(并将它们组合在UI中)。 Thus: 从而:

interface IField
{
    string[] Accepts { get; } // Gets the fields that this can accept.
    string[] Gives { get; } // Gets the field that this would give.

    IEnumerable<KeyValuePair<string, string>> Handle(IEnumerable<KeyValuePair<string, string>> fields);
}

class NameField
{
    string[] Accepts { get return new string[] { "FirstName", "LastName", "Name", "First Name", etc. }; }
    string[] Gives { get return new string[] { "FirstName", "LastName" }; }

    IEnumerable<KeyValuePair<string, string>> Handle(IEnumerable<KeyValuePair<string, string>> fields)
    {
       string firstName = null, lastName = null;
       foreach(KeyValuePair<string, string> field in fields)
       {
           switch(field.Key)
           {
                  case "FirstName":
                  case "First Name":
                  firstName = field.Value;
                  break;
                  // ...
                  case "FullName":
                  case "Full Name":
                  // Split into fn and ln.
                  break;
                  // ...
           }
       }
       yield return new KeyValuePair<string, string>("FirstName", firstName);
       yield return new KeyValuePair<string, string>("LastName", lastName);
    }
}

In any case, I am sure you get the idea. 无论如何,我相信你明白了。 A bunch of transforms that will turn fields into recognized ones. 一堆变换将字段转换为已识别的变换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM