Python - 如何将分类器的结果加入 DataFrame 以将其可视化为散点 plot？

Question

I'm newbie in this and I would like to apply RandomUnderSampler ( from imblearn.under_sampling import RandomUnderSampler ) in order to balance the classes distribution, and then plot the points belong to each class with a different color.我是这方面的新手，我想应用 RandomUnderSampler （ from imblearn.under_sampling import RandomUnderSampler ）以平衡类分布，然后 plot 点属于每个 class 具有不同的颜色。

So far, I have done the following, I think this could be work but I don't know how to convert X_res, y_res into a DataFrame.到目前为止，我已经完成了以下操作，我认为这可能可行，但我不知道如何将X_res, y_res转换为 DataFrame。

X_res, y_res = RandomUnderSampler(random_state=seed, sampling_strategy=1.0).fit_resample(X, y)

// Do something with X_res and y_res to get a DataFrame

from sklearn.decomposition import PCA

# split data into min and maj classes
(min_points, maj_points, _, _, _) = splitByClass(df)

# fit PCA with minority points
pca = PCA(n_components=2)
pca_min = pca.fit_transform(min_points)

fig, ax = plt.subplots()
ax.scatter(pca_min[:, 0], pca_min[:, 1], color='r', label='minority', alpha=0.4, edgecolors='none')

pca_maj = pca.fit_transform(maj_points)
ax.scatter(pca_maj[:, 0], pca_maj[:, 1], color='b', label='majority', alpha=0.4, edgecolors='none')

ax.legend()
ax.grid(True)

plt.tight_layout()
plt.show()

Answer 1

what does the function RandomUnderSampler() returns,to make a dataframe there are couple of ways depending on what the RandomUnderSampler() returns. function RandomUnderSampler() 返回什么，要制作 dataframe 有几种方法，具体取决于 RandomUnderSampler() 返回的内容。 for eg: Creating Pandas DataFrame from lists of lists.例如：从列表中创建 Pandas DataFrame。

Import pandas library导入 pandas 库

Creating Pandas DataFrame from lists of lists.从列表列表中创建 Pandas DataFrame。

import pandas as pd 
data = [['tom', 10], ['nick', 15], ['juli', 14]] 
df = pd.DataFrame(data, columns = ['Name', 'Age']) 
df

Method #2: Creating DataFrame from dict of narray/lists方法 #2：从 narray/lists 的字典创建 DataFrame

import pandas as pd 
data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]} 
df = pd.DataFrame(data) 
df

like there are plenty of ways.to know the type of return object u can run type(X_res) type(y_res)就像有很多方法一样。知道返回的类型 object 你可以运行 type(X_res) type(y_res)

and post it here.并张贴在这里。

Answer 2

I have solved this doing the following:我已经通过以下方式解决了这个问题：

# create an empty DataFrame with the desired columns
resultDF = pd.DataFrame(columns=col_names)
# extract from X_res each column and save it in the DF
for index, name in zip(range(len(col_names)), col_names):
    resultDF[name] = X_res[:,index]
# save the y_res in the last columns called 'Class'
resultDF['Class'] = y_res

Python - 如何将分类器的结果加入 DataFrame 以将其可视化为散点 plot？

问题描述

2 个解决方案

解决方案1
0 2019-10-08 15:44:56

Import pandas library导入 pandas 库

解决方案2
0 已采纳 2019-10-09 19:29:00

Python - 如何将分类器的结果加入 DataFrame 以将其可视化为散点 plot？

问题描述

2 个解决方案

解决方案1 0 2019-10-08 15:44:56

Import pandas library导入 pandas 库

解决方案2 0 已采纳 2019-10-09 19:29:00

解决方案1
0 2019-10-08 15:44:56

解决方案2
0 已采纳 2019-10-09 19:29:00