[英]train_test_split function in C++
I would like to create a train_test_split function that splits a matrix (vector of vectors) of data into two other matrices, similar to what sklearn's function does.我想创建一个 train_test_split function 将数据矩阵(向量向量)拆分为其他两个矩阵,类似于 sklearn 的 function 所做的。 This is my attempt in doing so:
这是我这样做的尝试:
#include <iostream>
#include <cstdlib>
#include <fstream>
#include <time.h>
#include <vector>
#include <string>
using namespace std;
vector<vector<float>> train_test_split(vector<vector<float>> df, float train_size = 0.8){
vector<vector<float>> train;
vector<vector<float>> test;
srand(time(NULL));
for(int i = 0; i < df.size(); i++){
int x = rand() % 10 + 1;
if(x <= train_size * 10){
train.push_back(df[i]);
}
else{
test.push_back(df[i]);
}
}
return train, test;
}
int main(){
vector<vector<float>> train;
vector<vector<float>> test;
vector<vector<float>> df = {{1,2,3,4},
{5,6,7,8},
{9,10,11,12}};
train, test = train_test_split(df);
cout << "training size: " << train.size() << ", test size: " << test.size() << endl;
return 0;
}
This approach sends data only in the test
matrix.这种方法仅在
test
矩阵中发送数据。 After some research, I have discovered that C++ cannot output two values in the same function.经过一番研究,我发现 C++ 不能 output 两个值在同一个 function 中。 I am very new in C++, and I am wondering what would be the best way to approach this.
我是 C++ 的新手,我想知道解决这个问题的最佳方法是什么。 Any help will be appreciated.
任何帮助将不胜感激。
A function can only return one value. function 只能返回一个值。 Though look at your function declaration: It is declared to return a
vector<vector<float>>
, and thats a container of many vector<float>
s.虽然看看你的 function 声明:它被声明返回一个
vector<vector<float>>
,那是许多vector<float>
的容器。 Containers can contain many elements (of same type) and custom types can contain many members:容器可以包含许多元素(相同类型),自定义类型可以包含许多成员:
struct train_test_split_result {
vector<vector<float>> train;
vector<vector<float>> test;
};
train_test_split_result train_test_split(vector<vector<float>> df, int train_size = 0.8) {
train_test_split_result result;
// ...
// result.train.push_back(...)
// result.test.push_back(...)
// ...
return result;
}
int main(){
vector<vector<float>> df = {{1,2,3,4},
{5,6,7,8},
{9,10,11,12}};
train_test_split_result result = train_test_split(df);
cout << "training size: " << result.train.size() << ", test size: " << result.test.size() << endl;
}
PS: You should turn up your compilers warnings and read them: Then read this: How does the Comma Operator work PS:您应该打开编译器警告并阅读它们:然后阅读: 逗号运算符如何工作
PPS: A nested vector is a terrible data structure for a matrix. PPS:嵌套向量对于矩阵来说是一种糟糕的数据结构。
std::vector
benefits a lot from memory locality, but because its elements are dynamically allocated, the float
s in a std::vector<std::vector<float>>
are scattered around in memory. std::vector
从 memory 局部性中受益匪浅,但由于其元素是动态分配的,因此std::vector<std::vector<float>>
中的float
分散在 memory 中。 If the size is known at compile time and not too big (that it would require dynamic allocation) you can use a nested array.如果在编译时知道大小并且不太大(需要动态分配),则可以使用嵌套数组。 Alternatively use a flat
std::vector<float>
to store the matrix.或者使用平面
std::vector<float>
来存储矩阵。
PPPS: There are also "out paramters": The function can have arguments by non-const reference, the caller passes them and the function modifies them. PPPS:还有“输出参数”:function 可以通过非常量引用获得 arguments,调用者传递它们,function 修改它们。 Though generally out-parameters are not recommended.
虽然通常不建议使用超出参数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.