简体   繁体   中英

Extracting a particular data from a CSV file in c++

I have written a program to read a CSV file but I'm having some trouble in extracting data from that CSV file in c++. I want to count the no. of columns starting from the 5th column in the 1st row until the last column of the 1st row of the CSV file. I have written the following code to read a CVS file, but I am not sure how shall I count the no. of columns as I have mentioned before. Will appreciate it if anyone could please tell me how shall I go about it?

char* substring(char* source, int startIndex, int endIndex)
{
int size = endIndex - startIndex + 1;
char* s = new char[size+1];
strncpy(s, source + startIndex, size); //you can read the documentation of strncpy online
s[size]  = '\0'; //make it null-terminated
return s;
}

char** readCSV(const char* csvFileName, int& csvLineCount)
{
ifstream fin(csvFileName);
if (!fin)
{
    return nullptr;
}
csvLineCount = 0;
char line[1024];
while(fin.getline(line, 1024))
{
    csvLineCount++;
};
char **lines = new char*[csvLineCount];
fin.clear();
fin.seekg(0, ios::beg);
for (int i=0; i<csvLineCount; i++)
{
    fin.getline(line, 1024);
    lines[i] = new char[strlen(line)+1];
    strcpy(lines[i], line);

};
fin.close();
return lines;
}

I have attached a few lines from the CSV file:-

Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0

What I need is, in the 1st row, the number of dates after Long.

To answer your question:

I have attached a few lines from the CSV file:- Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0

What I need is, in the 1st row, the number of dates after Long.

Yeah, not that difficult - that's how I would do it:

#include <iostream>
#include <string>
#include <fstream>
#include <regex>

#define FILENAME "test.csv" //Your filename as Macro 
//(The compiler just sees text.csv instead of FILENAME)



void read(){
std::string n;

//date format pattern %m/%dd/%YY
std::regex pattern1("\\b\\d{1}[/]\\d{2}[/]\\d{2}\\b");
//date format pattern %mm/%dd/%YY
std::regex pattern2("\\b\\d{2}[/]\\d{2}[/]\\d{2}\\b");
std::smatch result1, result2;

std::ifstream file(FILENAME, std::ios::in);
    if ( ! file.is_open() )
    {
        std::cout << "Could not open file!" << '\n';
    }

    do{
            getline(file,n,',');
            //https://en.cppreference.com/w/cpp/string/basic_string/getline
            if(std::regex_search(n,result1,pattern1))
                    std::cout << result1.str(1) << n <<  std::endl;
            if(std::regex_search(n,result2,pattern2))
                    std::cout << result2.str(1) << n <<  std::endl;
    }
    while(!file.eof());
    file.close();
}

int main ()
{
    read();
    return 0;
}

The file test.csv contains the following for testing:

    Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0 
    Province/State,Country/Region,Lat,Long,1/25/20,12/26/20,1/27/20, ,Bfghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Blbania,41.1533,20.1683,0,0,0,0 

It actually is pretty simple:

  1. getline takes the open file and "escapes" at a so called escape-charachter, in your case a comma ','. (That is the very best way I found in reading csv - you can replace it with whatever you want, for example: ';' or ' ' or '...' - guess you get the drill)

  2. After this you got all data nicely separated underneath one another without a comma.

  3. Now you can "filter" out what you need. I use regex - but use what ever you want. (Just fyi: For c++ tagged questions you shouldn't use c-style like strncpy..)

  4. I gave you an example for 1.23.20 (m/dd/yy) and to make it simple if your file contains a november or december like 12.22.20 (mm/dd/yy) to make the regex pattern more easy to read/understand in 2 lines.

you can/may have to expand the regex pattern if the data somehow matches your date format in the file, really good explained here and not as complicated as it looks.

  1. From that point you can put all the printed stuff fe in a vector (some more convenient array) to handle and/or pass/return data - that's up to you.

If you need more explaining I am happy to help you out and/or expand this example, just leave a comment.

You basically want to search for the seperator substring within your line (normally it is ';').
If you print out your lines it should look like this:

a;b;c;d;e;f;g;h

There are several ways to achieve what you want, I would look for a strip or split upon character function. Something along the example below should work. If you use std you can go with str.IndexOf instead of a loop.

int rows(char* line,char seperator, int count) {
unsigned length = strlen(line);
for (int i=pos; i<length;i++){
  if(strcmp(line[i],seperator)) break;
}
count++;
if (i<length-1) return rows(substring(line,i,length-i),seperator,count);
else return count;
}

The recursion can obviously be replaced by one loop;)

int countSign(char* line, char* sign){
  unsigned l = strlen(line);
  int count = 0;
  for (int i=0; i < l; i++) {
    if(strcmp(line[i],sign)) count++;
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM