[英]How to use cross_val_score() Sklearn?
I am trying to do a k-fold cross-validation using Sklearn in Python and have followed two tutorials now, but my code will not run for the validation. 我正在尝试在Python中使用Sklearn进行k折交叉验证,并且现在已经遵循了两个教程,但是我的代码无法运行以进行验证。
Every time I try doing 每当我尝试做
cross_val_score(dt, x, y, cv=5)
I get the error: 我得到错误:
Traceback (most recent call last):
File "C:/Users/djsg38/Documents/CS6001-SpatialTemporal/HW2/main.py", line 573, in <module>
scores = cross_val_score(dt, x, y, cv=5)
File "C:\Python27\lib\site-packages\sklearn\model_selection\_validation.py", line 128, in cross_val_score
X, y, groups = indexable(X, y, groups)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 206, in indexable
check_consistent_length(*result)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 177, in check_consistent_length
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 116, in _num_samples
'estimator %s' % x)
TypeError: Expected sequence or array-like, got estimator Its official US President Barack Obama wants lawmakers weigh \
0 1 4 12 3 2 12 4 4 2
1 0 0 1 0 0 0 0 0 0
2 1 0 4 0 0 0 0 0 0
3 0 0 0 0 0 0 4 0 0
4 0 3 10 0 0 1 0 0 0
5 0 0 0 0 0 0 0 0 0
6 0 0 0 4 1 7 0 0 0
7 3 0 0 0 0 0 0 0 0
8 1 0 4 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
10 0 1 6 3 0 3 0 0 0
11 0 0 0 1 0 0 0 0 0
12 0 2 1 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0
17 0 0 5 4 1 9 1 0 0
18 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
21 0 0 3 2 1 1 0 0 1
22 0 0 0 0 0 0 0 0 0
23 0 0 1 0 0 0 0 0 0
24 1 0 0 0 0 0 0 0 0
25 0 0 0 1 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0
27 0 0 1 0 0 0 0 0 0
28 0 0 0 0 0 0 0 0 0
29 0 1 0 0 0 0 0 0 0
.. ... ... .. ... ... ... ... ... ...
70 0 0 0 0 0 0 0 0 0
71 0 0 0 2 0 5 0 0 0
72 5 0 0 0 0 0 0 0 0
73 0 0 0 0 0 0 0 0 0
74 0 0 1 0 0 0 0 0 0
75 1 0 1 0 0 0 1 0 0
76 2 0 0 0 0 0 0 0 0
77 1 0 0 0 0 0 0 0 0
78 0 0 0 0 0 0 0 0 0
79 1 0 0 0 0 0 0 0 0
80 0 0 0 0 0 0 0 0 0
81 0 0 1 0 0 0 0 0 0
82 0 0 1 0 0 0 0 0 0
83 0 0 0 0 0 0 0 1 0
84 0 0 2 4 1 3 1 0 0
85 0 0 0 1 0 0 0 0 0
86 0 0 1 0 0 0 0 0 0
87 0 0 0 0 0 0 0 0 0
88 0 0 0 0 0 0 0 0 0
89 0 0 0 0 0 0 0 0 0
90 0 0 0 0 0 0 0 0 0
91 0 0 2 1 0 0 0 0 0
92 0 0 0 0 0 0 0 0 0
93 0 0 0 0 0 0 0 0 0
94 1 0 0 0 0 0 0 0 0
95 0 2 1 0 0 0 0 0 0
96 0 0 0 0 0 0 0 0 0
97 0 0 4 1 0 0 0 0 0
98 0 0 11 1 0 0 0 0 0
99 0 0 0 0 0 0 0 0 0
whether ... Heh heh funny disassociate personWere \
0 4 ... 0 0 0 0 0
1 0 ... 0 0 0 0 0
2 0 ... 0 0 0 0 0
3 0 ... 0 0 0 0 0
4 0 ... 0 0 0 0 0
5 0 ... 0 0 0 0 0
6 2 ... 0 0 0 0 0
7 0 ... 0 0 0 0 0
8 0 ... 0 0 0 0 0
9 0 ... 0 0 0 0 0
10 0 ... 0 0 0 0 0
11 1 ... 0 0 0 0 0
12 0 ... 0 0 0 0 0
13 1 ... 0 0 0 0 0
14 0 ... 0 0 0 0 0
15 1 ... 0 0 0 0 0
16 0 ... 0 0 0 0 0
17 1 ... 0 0 0 0 0
18 0 ... 0 0 0 0 0
19 0 ... 0 0 0 0 0
20 0 ... 0 0 0 0 0
21 8 ... 0 0 0 0 0
22 0 ... 0 0 0 0 0
23 0 ... 0 0 0 0 0
24 0 ... 0 0 0 0 0
25 0 ... 0 0 0 0 0
26 1 ... 0 0 0 0 0
27 0 ... 0 0 0 0 0
28 0 ... 0 0 0 0 0
29 0 ... 0 0 0 0 0
.. ... ... ... ... ... ... ...
70 0 ... 0 0 0 0 0
71 1 ... 0 0 0 0 0
72 0 ... 0 0 0 0 0
73 0 ... 0 0 0 0 0
74 0 ... 0 0 0 0 0
75 0 ... 0 0 0 0 0
77 0 ... 0 0 0 0 0
78 0 ... 0 0 0 0 0
79 1 ... 0 0 0 0 0
80 0 ... 0 0 0 0 0
81 3 ... 0 0 0 0 0
82 0 ... 0 0 0 0 0
83 0 ... 0 0 0 0 0
84 0 ... 0 0 0 0 0
85 0 ... 0 0 0 0 0
86 0 ... 0 0 0 0 0
87 0 ... 0 0 0 0 0
88 0 ... 0 0 0 0 0
89 1 ... 0 0 0 0 0
90 0 ... 0 0 0 0 0
91 0 ... 0 0 0 0 0
92 0 ... 0 0 0 0 0
93 0 ... 0 0 0 0 0
94 1 ... 0 0 0 0 0
95 0 ... 0 0 0 0 0
96 0 ... 0 0 0 0 0
97 0 ... 0 0 0 0 0
98 1 ... 0 0 0 0 0
99 0 ... 1 1 1 1 1
therehighlightAs indepth umpireshighlightThe headhighlightTwo \
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
11 0 0 0 0
12 0 0 0 0
13 0 0 0 0
14 0 0 0 0
15 0 0 0 0
16 0 0 0 0
17 0 0 0 0
18 0 0 0 0
19 0 0 0 0
20 0 0 0 0
21 0 0 0 0
22 0 0 0 0
23 0 0 0 0
24 0 0 0 0
25 0 0 0 0
26 0 0 0 0
27 0 0 0 0
28 0 0 0 0
29 0 0 0 0
.. ... ... ... ...
70 0 0 0 0
71 0 0 0 0
72 0 0 0 0
73 0 0 0 0
74 0 0 0 0
75 0 0 0 0
76 0 0 0 0
77 0 0 0 0
78 0 0 0 0
79 0 0 0 0
80 0 0 0 0
81 0 0 0 0
82 0 0 0 0
83 0 0 0 0
84 0 0 0 0
85 0 0 0 0
86 0 0 0 0
87 0 0 0 0
88 0 0 0 0
89 0 0 0 0
90 0 0 0 0
91 0 0 0 0
92 0 0 0 0
93 0 0 0 0
94 0 0 0 0
95 0 0 0 0
96 0 0 0 0
97 0 0 0 0
98 0 0 0 0
99 1 1 1 1
disrespect
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
.. ...
70 0
71 0
72 0
73 0
74 0
75 0
76 0
77 0
78 0
79 0
80 0
81 0
82 0
83 0
84 0
85 0
86 0
87 0
88 0
89 0
90 0
91 0
92 0
93 0
94 0
95 0
96 0
97 0
98 0
99 1
[100 rows x 12993 columns]
Here is my code: 这是我的代码:
def encode_target(df, target_column):
df_mod = df.copy()
targets = df_mod[target_column].unique()
map_to_int = {name: n for n, name in enumerate(targets)}
df_mod["Target"] = df_mod[target_column].replace(map_to_int)
return (df_mod, targets)
df = pd.read_csv("C:/Users/djsg38/Documents/CS6001- SpatialTemporal/HW2/finalCounts.csv")
df2, targets = encode_target(df, "MYLABEL")
features = list(df2.columns[:12338])
y = df2["TARGET"]
x = df2[features]
dt = DecisionTreeClassifier()
dt.fit(x, y)
scores = cross_val_score(dt, x, y, cv=5)
My DecisionTreeClassifier seems to work fine, and when I output it as an image it looks good, but the problem herein lies with the last line. 我的DecisionTreeClassifier似乎工作正常,当我将其输出为图像时看起来不错,但是这里的问题出在最后一行。
PS I am not sure if maybe there is a column limit? PS我不确定是否存在列限制? The classic example I followed used the Iris dataset, so there was on four columns to look at data over.
我遵循的经典示例使用了Iris数据集,因此有四列用于查看数据。 For me, though, I have 12,338 columns of data (word count of each unique word from 100 articles).
但是对我来说,我有12338列数据(100篇文章中每个唯一单词的单词计数)。
Contrary to what the tutorial I was following did, I could not through my X value as it received errors. 与我所遵循的教程相反,我无法通过X值接收错误。 Reasons could be due to having string headers in it, not positive.
原因可能是由于其中包含字符串标题,而不是肯定的。
The solution I did was just manually 5-fold split my data up and perform 5 different decision trees on the data, with 1 test set each time. 我要做的解决方案是手动将数据拆分5倍,并对数据执行5种不同的决策树,每次设置1个测试。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.