Pandas Dataframe 样本的补码

Question

import pandas as pd

df = pd.read_csv("train.csv")

sample = df.sample(10)

sample.to_csv("train_subset.csv")

I want to sample 10 random rows from a given csv file (train.csv) and store it as a new csv file train_subset.csv.我想从给定的 csv 文件 (train.csv) 中随机抽取 10 行，并将其存储为新的 csv 文件 train_subset.csv。 The code above achieves that.上面的代码实现了这一点。 Now I also want to store all the rows that weren't sampled into a file train_remaining.csv.现在我还想将所有未采样的行存储到文件 train_remaining.csv 中。

How can I implement that?我该如何实施？ How do I find which rows were sampled?如何找到对哪些行进行了抽样？

Answer 1

You can use您可以使用

df.index.difference(sample.index)

where sample.index is the sected sample index.其中 sample.index 是分段样本索引。

And then use it for select the complementary dataframe:然后将其用于 select 互补的 dataframe：

complementary = df.iloc[df.index.difference(sample.index)]

Answer 2

I would suggest using sklearns train_test_split. 我建议使用sklearns train_test_split。

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

This will allow you to take a percentage of the rows that are randomly selected. 这将允许您采用随机选择的行的百分比。

Pandas Dataframe 样本的补码

问题描述

2 个解决方案

解决方案1
2 2022-03-15 17:08:21

解决方案2
0 2017-05-11 15:49:53

Pandas Dataframe 样本的补码

问题描述

2 个解决方案

解决方案1 2 2022-03-15 17:08:21

解决方案2 0 2017-05-11 15:49:53

解决方案1
2 2022-03-15 17:08:21

解决方案2
0 2017-05-11 15:49:53