简体   繁体   English

Pandas Dataframe 样本的补码

[英]Complement of Pandas Dataframe Sample

import pandas as pd

df = pd.read_csv("train.csv")

sample = df.sample(10)

sample.to_csv("train_subset.csv")

I want to sample 10 random rows from a given csv file (train.csv) and store it as a new csv file train_subset.csv.我想从给定的 csv 文件 (train.csv) 中随机抽取 10 行,并将其存储为新的 csv 文件 train_subset.csv。 The code above achieves that.上面的代码实现了这一点。 Now I also want to store all the rows that weren't sampled into a file train_remaining.csv.现在我还想将所有未采样的行存储到文件 train_remaining.csv 中。

How can I implement that?我该如何实施? How do I find which rows were sampled?如何找到对哪些行进行了抽样?

You can use您可以使用

df.index.difference(sample.index)

where sample.index is the sected sample index.其中 sample.index 是分段样本索引。

And then use it for select the complementary dataframe:然后将其用于 select 互补的 dataframe:

complementary = df.iloc[df.index.difference(sample.index)]

I would suggest using sklearns train_test_split. 我建议使用sklearns train_test_split。

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

This will allow you to take a percentage of the rows that are randomly selected. 这将允许您采用随机选择的行的百分比。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM