简体   繁体   English

从 c++ 中的 CSV 文件中提取特定数据

[英]Extracting a particular data from a CSV file in c++

I have written a program to read a CSV file but I'm having some trouble in extracting data from that CSV file in c++.我编写了一个程序来读取 CSV 文件,但是在从 c++ 中的 CSV 文件中提取数据时遇到了一些麻烦。 I want to count the no.我要数数。 of columns starting from the 5th column in the 1st row until the last column of the 1st row of the CSV file.从 CSV 文件的第 1 行第 5 列到第 1 行最后一列的列数。 I have written the following code to read a CVS file, but I am not sure how shall I count the no.我已经编写了以下代码来读取 CVS 文件,但我不确定如何计算编号。 of columns as I have mentioned before.正如我之前提到的那样。 Will appreciate it if anyone could please tell me how shall I go about it?如果有人能告诉我,我将如何处理它?

char* substring(char* source, int startIndex, int endIndex)
{
int size = endIndex - startIndex + 1;
char* s = new char[size+1];
strncpy(s, source + startIndex, size); //you can read the documentation of strncpy online
s[size]  = '\0'; //make it null-terminated
return s;
}

char** readCSV(const char* csvFileName, int& csvLineCount)
{
ifstream fin(csvFileName);
if (!fin)
{
    return nullptr;
}
csvLineCount = 0;
char line[1024];
while(fin.getline(line, 1024))
{
    csvLineCount++;
};
char **lines = new char*[csvLineCount];
fin.clear();
fin.seekg(0, ios::beg);
for (int i=0; i<csvLineCount; i++)
{
    fin.getline(line, 1024);
    lines[i] = new char[strlen(line)+1];
    strcpy(lines[i], line);

};
fin.close();
return lines;
}

I have attached a few lines from the CSV file:-我附上了 CSV 文件中的几行:-

Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0省/州,国家/地区,纬度,经度,1/22/20,1/23/20,1/24/20, ,阿富汗,33.0,65.0,0,0,0,0,0,0,0 , ,阿尔巴尼亚,41.1533,20.1683,0,0,0,0

What I need is, in the 1st row, the number of dates after Long.我需要的是,在第一行,Long 之后的日期数。

To answer your question:要回答您的问题:

I have attached a few lines from the CSV file:- Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0我附上了 CSV 文件中的几行:- 省/州,国家/地区,纬度,经度,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0 ,0,0,0,0,0,0, ,阿尔巴尼亚,41.1533,20.1683,0,0,0,0

What I need is, in the 1st row, the number of dates after Long.我需要的是,在第一行,Long 之后的日期数。

Yeah, not that difficult - that's how I would do it:是的,没那么难——我就是这样做的:

#include <iostream>
#include <string>
#include <fstream>
#include <regex>

#define FILENAME "test.csv" //Your filename as Macro 
//(The compiler just sees text.csv instead of FILENAME)



void read(){
std::string n;

//date format pattern %m/%dd/%YY
std::regex pattern1("\\b\\d{1}[/]\\d{2}[/]\\d{2}\\b");
//date format pattern %mm/%dd/%YY
std::regex pattern2("\\b\\d{2}[/]\\d{2}[/]\\d{2}\\b");
std::smatch result1, result2;

std::ifstream file(FILENAME, std::ios::in);
    if ( ! file.is_open() )
    {
        std::cout << "Could not open file!" << '\n';
    }

    do{
            getline(file,n,',');
            //https://en.cppreference.com/w/cpp/string/basic_string/getline
            if(std::regex_search(n,result1,pattern1))
                    std::cout << result1.str(1) << n <<  std::endl;
            if(std::regex_search(n,result2,pattern2))
                    std::cout << result2.str(1) << n <<  std::endl;
    }
    while(!file.eof());
    file.close();
}

int main ()
{
    read();
    return 0;
}

The file test.csv contains the following for testing:文件 test.csv 包含以下用于测试的内容:

    Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0 
    Province/State,Country/Region,Lat,Long,1/25/20,12/26/20,1/27/20, ,Bfghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Blbania,41.1533,20.1683,0,0,0,0 

It actually is pretty simple:其实很简单:

  1. getline takes the open file and "escapes" at a so called escape-charachter, in your case a comma ','. getline获取打开的文件并以所谓的转义字符“转义”,在您的情况下为逗号','。 (That is the very best way I found in reading csv - you can replace it with whatever you want, for example: ';' or ' ' or '...' - guess you get the drill) (这是我在阅读 csv 时发现的最好的方法——你可以用你想要的任何东西替换它,例如:';' 或 ' ' 或 '...' - 猜你懂了)

  2. After this you got all data nicely separated underneath one another without a comma.在此之后,您将所有数据很好地分开,没有逗号。

  3. Now you can "filter" out what you need.现在你可以“过滤”出你需要的东西。 I use regex - but use what ever you want.我使用正则表达式 - 但使用你想要的任何东西。 (Just fyi: For c++ tagged questions you shouldn't use c-style like strncpy..) (仅供参考:对于 c++ 标记的问题,您不应使用 strncpy 之类的 c 样式 ..)

  4. I gave you an example for 1.23.20 (m/dd/yy) and to make it simple if your file contains a november or december like 12.22.20 (mm/dd/yy) to make the regex pattern more easy to read/understand in 2 lines.我给了你一个 1.23.20 (m/dd/yy) 的例子,如果你的文件包含像 12.22.20 (mm/dd/yy) 这样的 11 月或 12 月,让正则表达式模式更容易阅读/理解2行。

you can/may have to expand the regex pattern if the data somehow matches your date format in the file, really good explained here and not as complicated as it looks.如果数据以某种方式与文件中的日期格式匹配,您可以/可能必须扩展正则表达式模式,这里解释得很好,并不像看起来那么复杂。

  1. From that point you can put all the printed stuff fe in a vector (some more convenient array) to handle and/or pass/return data - that's up to you.从那时起,您可以将所有打印的东西 fe 放在一个向量(一些更方便的数组)中以处理和/或传递/返回数据 - 这取决于您。

If you need more explaining I am happy to help you out and/or expand this example, just leave a comment.如果您需要更多解释,我很乐意为您提供帮助和/或扩展此示例,请发表评论。

You basically want to search for the seperator substring within your line (normally it is ';').您基本上想在您的行中搜索分隔符 substring(通常是 ';')。
If you print out your lines it should look like this:如果你打印出你的行,它应该是这样的:

a;b;c;d;e;f;g;h

There are several ways to achieve what you want, I would look for a strip or split upon character function.有几种方法可以实现您想要的,我会寻找一条带或拆分字符 function。 Something along the example below should work.下面的示例中的某些内容应该可以工作。 If you use std you can go with str.IndexOf instead of a loop.如果你使用 std 你可以 go 用 str.IndexOf 代替循环。

int rows(char* line,char seperator, int count) {
unsigned length = strlen(line);
for (int i=pos; i<length;i++){
  if(strcmp(line[i],seperator)) break;
}
count++;
if (i<length-1) return rows(substring(line,i,length-i),seperator,count);
else return count;
}

The recursion can obviously be replaced by one loop;)递归显然可以用一个循环代替;)

int countSign(char* line, char* sign){
  unsigned l = strlen(line);
  int count = 0;
  for (int i=0; i < l; i++) {
    if(strcmp(line[i],sign)) count++;
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM