简体   繁体   中英

How to Search Data From Huge CSV Files (20Gb) C# ASP.NET

I want to create a program using .Net to read or search data in a 20Gb CSV file

Is there any way to do it ?

My Code For Search

string search = txtBoxSearch.Text;
string pathOnly = Path.GetDirectoryName(csvPath);
string fileName = Path.GetFileName(csvPath);

string sql = @"SELECT F1 AS StringID, F2 AS StringContent FROM [" + fileName + "] WHERE F2 LIKE '%" + search + "%'";

using (OleDbConnection connection = new OleDbConnection(
        @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + pathOnly +
        ";Extended Properties=\"Text;HDR=No\""))
using (OleDbCommand command = new OleDbCommand(sql, connection))
using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
{
        DataTable dataTable = new DataTable();
        adapter.Fill(dataTable);
        dataTable.Columns.Add("MatchTimes", typeof(System.Int32));

         foreach (DataRow row in dataTable.Rows)
         {
                 row["MatchTimes"] = Regex.Matches(row["StringContent"].ToString(), search).Count;
         }

         GridViewResult.DataSource = dataTable;
         GridViewResult.DataBind();

My Code for generate the CSV File

int records = 100000;

File.AppendAllLines(csvPath, 
   (from r in Enumerable.Range(0, records) 
      let guid = Guid.NewGuid() 
      let stringContent = GenerateRandomString(256000) 
      select $"{guid},{stringContent}"));

This really depends on exactly how you're searching. If you're just doing a single search, you could simply read this one line at a time and do a string comparison or something. If you do this, do not load the whole thing into memory - load it one at a time.

If you have access to the "full" edition of SQL Server, you could do a BULK INSERT. If you don't, though (eg you're using one of the express editions), you might run into the maximum table size. In this case, I've never tried this, but you could try SQLite. In theory at least , the database can handle multiple terabytes. Be sure to insert a large number of records in each transaction, though; if you do a commit after each insert your performance will be absolutely wretched. Also, be sure that you're not creating an in-memory database, or you'll just run out of memory again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM