简体   繁体   English

在C#中解析Excel文件,单元格似乎被切断为255个字符...我该如何阻止它?

[英]Parsing an Excel file in C#, the cells seem to get cut off at 255 characters… how do I stop that?

I am parsing through an uploaded excel files (xlsx) in asp.net with c#. 我正在使用c#解析asp.net中上传的excel文件(xlsx)。 I am using the following code (simplified): 我使用以下代码(简化):

string connString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileLocation + ";Extended Properties=\"Excel 12.0 Xml;HDR=YES\";");
OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", connString);
DataSet ds = new DataSet();
adapter.Fill(ds);
adapter.Dispose();
DataTable dt = ds.Tables[0];
var rows = from p in dt.AsEnumerable() select new { desc = p[2] };

This works perfectly, but if there is anything longer than 255 characters in the cell, it will get cut off. 这样可以很好地工作, 如果单元格中有超过255个字符的任何内容,它将被切断。 Any idea what I am doing wrong? 知道我做错了什么吗? Thank you. 谢谢。

EDIT: When viewing the excel sheet, it shows much more than 255 characters, so I don't believe the sheet itself is limited. 编辑:查看Excel工作表时,它显示超过255个字符,所以我不相信工作表本身是有限的。

The Solution! 解决方案!

I've been battling this today as well. 我今天也一直在争夺这个。 I finally got it to work by modifying some registry keys before parsing the Excel spreadsheet. 在解析Excel电子表格之前,我最终通过修改一些注册表项来实现它。

You must update this registry key before parsing the Excel spreadsheet: 在解析Excel电子表格之前,必须更新此注册表项:

// Excel 2010
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\
or
HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\

// Excel 2007
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\

// Excel 2003
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\

Change TypeGuessRows to 0 and ImportMixedTypes to Text under this key. 在此键下将TypeGuessRows更改为0 ,将ImportMixedTypes更改为Text You'll also need to update your connection string to include IMEX=1 in the extended properties: 您还需要更新连接字符串以在扩展属性中包含IMEX=1

string connString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileLocation + ";Extended Properties=\"Excel 12.0 Xml;HDR=YES;IMEX=1\";");

References 参考

http://blogs.vertigo.com/personal/aanttila/Blog/archive/2008/03/28/excel-and-csv-reference.aspx http://blogs.vertigo.com/personal/aanttila/Blog/archive/2008/03/28/excel-and-csv-reference.aspx

http://msdn.microsoft.com/en-us/library/ms141683.aspx http://msdn.microsoft.com/en-us/library/ms141683.aspx

...characters may be truncated. ...字符可能会被截断。 To import data from a memo column without truncation, you must make sure that the memo column in at least one of the sampled rows contains a value longer than 255 characters, or you must increase the number of rows sampled by the driver to include such a row. 要从备注列导入数据而不截断,必须确保至少一个采样行中的备注列包含长度超过255个字符的值,或者必须增加驱动程序采样的行数以包含此类行。 You can increase the number of rows sampled by increasing the value of TypeGuessRows under the HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Jet\\4.0\\Engines\\Excel registry key.... 您可以通过增加HKEY_LOCAL_MACHINE \\ SOFTWARE \\ Microsoft \\ Jet \\ 4.0 \\ Engines \\ Excel注册表项下的TypeGuessRows值来增加采样的行数....

I have came across this, and the solution that worked for me was to move the cells with long text to the top of the spreadsheet. 我遇到过这种情况,对我有用的解决方案是将带有长文本的单元格移动到电子表格的顶部。

I found this comment in a forum describing the issue 我在描述该问题的论坛中发现了这条评论

This is an issue with the Jet OLEDB provider. 这是Jet OLEDB提供商的问题。 It looks at the first 8 rows 它查看前8行
of the spreadsheet to determine the data type in each column. 电子表格确定每列中的数据类型。 If the column does 如果列
not contain a field value over 256 characters in the first 8 rows , then it assumes the 在前8行中不包含超过256个字符的字段值,然后它假定为
data type is text, which has a character limit of 256. The following KB article has 数据类型是文本,其字符数限制为256.以下知识库文章有
more information on this issue: http://support.microsoft.com/kb/281517 有关此问题的详细信息: http//support.microsoft.com/kb/281517

Hope this help someone else! 希望这能帮助别人!

Have you tried setting the columns datatype to text within the spreadsheet? 您是否尝试将列数据类型设置为电子表格中的文本? I believe doing this will allow the cells to contain much more than 255 characters. 我相信这样做可以让单元格包含超过255个字符。

[Edit] For what it's worth this dialog with the MS-Excel team is an interesting read. [编辑]对于这个与MS-Excel团队对话的内容是值得一读的。 In the comments section at the bottom they get into some discussions about that 255 cutoff. 在底部的评论部分,他们进入了关于255截止的一些讨论。 They say Excel 12 can support 32k characters per cell. 他们说Excel 12可以支持每个单元32k个字符。

If that is true there must be a way to get at this data. 如果这是真的,必须有办法获得这些数据。 Here is two things to consider. 这有两件事需要考虑。

  1. In the past I have used the "IMEX=1" option in my connection string to deal with columns containing mixed data showing up as empty. 在过去,我在连接字符串中使用了“IMEX = 1”选项来处理包含显示为空的混合数据的列。 It's a longshot, but you might give that a try. 这是一个远景,但你可能会试一试。

  2. Could you export the file to a tab delimited flat file? 你能将文件导出到制表符分隔的平面文件吗? IMHO this is the most reliable way of dealing with Excel data, since Excel does have so many gotchas. 恕我直言这是处理Excel数据最可靠的方法,因为Excel确实有这么多陷阱。

关于上一篇文章,我还使用了SpreadsheetGear,发现从旧的XLS(非XLSX)格式读取时,每个单元格的限制也会受到255个字符的影响。

Just from a quick Googling of the subject, it appears that that's a limit of Excel. 只是从该主题的快速谷歌搜索,似乎这是Excel的限制。

EDIT : Possible workaround (unfortunately in VB) 编辑可能的解决方法(不幸的是在VB中)

SpreadsheetGear for .NET can read and write (and more) xls and xlsx workbooks and supports the same limitations as Excel for text - in other words it will just work. SpreadsheetGear for .NET可以读取和写入(以及更多)xls和xlsx工作簿,并支持与Excel相同的文本限制 - 换句话说它只会起作用。 There is a free evaluation if you want to give it a try. 如果您想尝试一下,可以免费评估。

Disclaimer: I own SpreadsheetGear LLC 免责声明:我拥有SpreadsheetGear LLC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM