[英]Complement of Pandas Dataframe Sample
import pandas as pd
df = pd.read_csv("train.csv")
sample = df.sample(10)
sample.to_csv("train_subset.csv")
I want to sample 10 random rows from a given csv file (train.csv) and store it as a new csv file train_subset.csv.我想从给定的 csv 文件 (train.csv) 中随机抽取 10 行,并将其存储为新的 csv 文件 train_subset.csv。 The code above achieves that.
上面的代码实现了这一点。 Now I also want to store all the rows that weren't sampled into a file train_remaining.csv.
现在我还想将所有未采样的行存储到文件 train_remaining.csv 中。
How can I implement that?我该如何实施? How do I find which rows were sampled?
如何找到对哪些行进行了抽样?
You can use您可以使用
df.index.difference(sample.index)
where sample.index is the sected sample index.其中 sample.index 是分段样本索引。
And then use it for select the complementary dataframe:然后将其用于 select 互补的 dataframe:
complementary = df.iloc[df.index.difference(sample.index)]
I would suggest using sklearns train_test_split. 我建议使用sklearns train_test_split。
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
This will allow you to take a percentage of the rows that are randomly selected. 这将允许您采用随机选择的行的百分比。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.