[英]Shuffling rows in a Pandas DataFrame while retaining the index
I am currently trying to find a way to randomize items in a dataframe row-wise.我目前正在尝试找到一种方法来按行随机化 dataframe 中的项目。 I want to preserve the column names as well as the index.
我想保留列名以及索引。 I just want to change the order of entries in my dataframe.
我只想更改 dataframe 中的条目顺序。
Currently, I was using目前,我正在使用
data = data.sample(frac=1).reset_index(drop=True)
However, this is causing some issues in terms of output.但是,这会导致 output 出现一些问题。 I don't think the rows are being shuffled properly.
我不认为行被正确洗牌。 Is there another way to achieve that?
还有其他方法可以实现吗?
The issue is that I am doing text analysis and when I am looking at the most correlated unigrams and bigrams with each class, I am getting different answers for shuffled and original data.问题是我正在做文本分析,当我查看每个 class 最相关的一元和二元时,我得到了洗牌和原始数据的不同答案。
This is the code I am using for monograms and bigrams这是我用于字母组合和双字母组合的代码
tfidf = TfidfVectorizer(sublinear_tf=True,
min_df=5,
stop_words=STOPWORDS,
norm = 'l2',
encoding='latin-1',
ngram_range=(1, 2))
feat = tfidf.fit_transform(data['Combine']).toarray()
N = 5 # Number of examples to be listed
for f, i in sorted(category_labels.items()):
chi2_feat = chi2(feat, labels == i)
indices = np.argsort(chi2_feat[0])
feat_names = np.array(tfidf.get_feature_names())[indices]
unigrams = [w for w in feat_names if len(w.split(' ')) == 1]
bigrams = [w for w in feat_names if len(w.split(' ')) == 2]
print("\nFlair '{}':".format(f))
print("Most correlated unigrams:\n\t. {}".format('\n\t. '.join(unigrams[-N:])))
print("Most correlated bigrams:\n\t. {}".format('\n\t. '.join(bigrams[-N:])))
Just using data = data.sample(frac=1)
samples the index as well and that is problematic.仅使用
data = data.sample(frac=1)
也会对索引进行采样,这是有问题的。 You can see the output below.您可以在下面看到 output。 We just need to change the values.
我们只需要更改这些值。
The correct method to achieve this is by just sampling the values.实现此目的的正确方法是仅对值进行采样。 I just figured it out.
我刚刚想通了。 We can do it this way.
我们可以这样做。 Thank you everybody who tried to help.
感谢所有试图提供帮助的人。
data[:] = data.sample(frac=1).values
I was getting the correct output from this.我从中得到了正确的 output 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.