使用OleDB读取文本文件时如何正确处理CR

Question

I have text files that are Tab delimited. 我有制表符分隔的文本文件。 I created a Schema.ini like so: 我这样创建了一个Schema.ini：

[MY_FILE.TAB]
Format=TabDelimited
ColNameHeader=False
Col1=id Short
Col2=data Text

This is the code I use to read it (C#): 这是我用来读取它的代码（C＃）：

using (var connection = new OleDbConnection(@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\FolderToData\;Extended Properties='text;FMT=delimited'"))
{
  using (var command = new OleDbCommand("SELECT * FROM MY_FILE.TAB", connection))
  {
    var table = new DataTable();
    using (var adapter = new OleDbDataAdapter(command)
    {
      adapter.Fill(table);
    }
  }
}

Everything works fine, except for one thing. 一切正常，除了一件事。 The data in the text file contains Carriage Returns [CR]. 文本文件中的数据包含回车符[CR]。 Records itself are separated by Carriage Return Line Feeds [CR][LF]. 记录本身由回车换行符[CR] [LF]分隔。 Unfortunately, OleDB / MicrosoftJet (or whatever parses these files) treats both ([CR], [CRLF]) the same. 不幸的是，OleDB / MicrosoftJet（或解析这些文件的任何文件）将两者（[CR]，[CRLF]）都相同。

Example of MY_FILE.TAB (there should be a Tab between numbers and text): MY_FILE.TAB的示例（数字和文本之间应该有一个制表符）：

1   One[CR][LF]
2   Two[CR][LF]
3   Th[CR]
ree[CR][LF]
4   Four[CR][LF]

Gives me 5 (malformed) Rows in the DataTable instead of 4. 在数据表中给我5（格式错误）行，而不是4。

What I need is: 我需要的是：

1   "One"
2   "Two"
3   "Th\nree"
4   "Four2

But I get: 但是我得到：

1    "One"
2    "Two"
3    "Th"
null null
4    "Four"

"ree" can't be converted to Int32 so first colum in fourth row is null. “ ree”不能转换为Int32，因此第四行的第一列为空。

How can I configure OleDB to treat [CR] different than [CR][LF]? 如何配置OleDB以将[CR]与[CR] [LF]区别对待？ Or any other Ideas? 或其他任何想法？

Answer 1

I don't believe you can reconfigure OLEDB to do this directly. 我不认为您可以重新配置OLEDB来直接执行此操作。

An alternative approach would be to use a TextReader and TextWriter to process the file into a temporary file, scanning for and replacing CR alone into some special escape sequence. 一种替代方法是使用TextReader和TextWriter将文件处理为临时文件，单独扫描CR并将其替换为某些特殊的转义序列。 Then use OLEDB to read this replacement temporary file; 然后使用OLEDB读取此替换临时文件； finally, replace the special escape sequence back to a CR. 最后，将特殊的转义序列替换回CR。

Answer 2

读取字符串中的文件内容，然后按Environment.NewLine或\\r\\n拆分，这会很容易，这将为您获得每行的数组，您可以按tab进一步拆分它吗？

使用OleDB读取文本文件时如何正确处理CR

问题描述

2 个解决方案

解决方案1
2 已采纳 2009-11-27 15:11:52

解决方案2
0 2009-11-27 15:11:16

使用OleDB读取文本文件时如何正确处理CR

问题描述

2 个解决方案

解决方案1 2 已采纳 2009-11-27 15:11:52

解决方案2 0 2009-11-27 15:11:16

解决方案1
2 已采纳 2009-11-27 15:11:52

解决方案2
0 2009-11-27 15:11:16