简体   繁体   English

在C#中读取制表符分隔文本文件的最佳方法是什么?

[英]What's the best way to read a tab-delimited text file in C#

We have a text file with about 100,000 rows, about 50 columns per row, most of the data is pretty small (5 to 10 characters or numbers). 我们有一个大约100,000行的文本文件,每行大约50列,大多数数据都很小(5到10个字符或数字)。

This is a pretty simple task, but just wondering what the best way would be to import this data into a C# data structure (for example a DataTable)? 这是一个非常简单的任务,但只是想知道将这些数据导入C#数据结构(例如DataTable)的最佳方法是什么?

I would read it in as a CSV with the tab column delimiters: 我会将其作为带有制表符分隔符的CSV读取:

A Fast CSV Reader 快速CSV阅读器

Edit: 编辑:
Here's a barebones example of what you'd need: 以下是您需要的准系统示例:

DataTable dt = new DataTable();
using (CsvReader csv = new CsvReader(new StreamReader(CSV_FULLNAME), false, '\t')) {
    dt.Load(csv);
}

Where CSV_FULLNAME is the full path + filename of your tab delimited CSV. 其中CSV_FULLNAME是制表符分隔的CSV的完整路径+文件名。

Use .NET's built in text parser. 使用.NET的内置文本解析器。 It is free, has great error handling, and deals with a lot of odd ball cases. 它是免费的,具有很好的错误处理能力,并且处理很多奇怪的球案例。

http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser(VS.80).aspx http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser(VS.80).aspx

What about FileHelpers , you can define the tab as a delimiter. 那么FileHelpers ,您可以将选项卡定义为分隔符。 HEad on over to that site by the link supplied and have a peeksy. 通过提供的链接访问该网站,并有一个peeksy。

Hope this helps, Best regards, Tom. 希望这会有所帮助,最好的问候,汤姆。

Two options: 两种选择:

  1. Use the classes in the System.Data.OleDb namespace. 使用System.Data.OleDb命名空间中的类。 This has the advantage of reading directly into a datatable like you asked with very little code, but it can be tricky to get right because it's tab rather than comma delimited. 这样做的好处是可以直接读入数据表,就像你用非常少的代码一样,但要正确起来可能很棘手,因为它是制表符而不是逗号分隔符。
  2. Use or write a csv parser. 使用或编写csv解析器。 Make sure it's a state machine-based parser like the one @Jay Riggs linked to rather than a String.Split()-based parser. 确保它是一个基于状态机的解析器,就像链接到@Jay Riggs的解析器而不是基于String.Split()的解析器。 This should be faster than the OleDb method, but it will give you a List or array rather than a datatable. 这应该比OleDb方法更快,但它会给你一个List或数组而不是数据表。

However you parse the lines, make sure you use something that supports forwarding and rewinding, being the data source of your data grid. 但是,您解析行,确保使用支持转发和倒带的东西,作为数据网格的数据源。 You don't want to load everything into memory first, do you? 您不想先将所有内容加载到内存中,对吗? How about if the amount of data should be ten-fold the next time? 如果下次数据量应该是十倍,怎么样? Make something that uses file.seek deep down, don't read everything to memory first. 制作一些内容使用file.seek的东西,不要先读取内存中的所有内容。 That's my advice. 这是我的建议。

Simple, but not the necessarily a great way: 简单,但不一定是一个很好的方式:

  • Read the file using a text reader into a string 使用文本阅读器将文件读入字符串

  • Use String.Split to get the rows 使用String.Split获取行

  • use String.Split with a tab character to get field values 使用带有制表符的String.Split来获取字段值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM