I'm training a python (2.7.11) classifier for text classification and while running I'm getting a deprecated warning message that I don't know which line in my code is causing it! The error/warning. However, the code works fine and give me the results...
\\AppData\\Local\\Enthought\\Canopy\\User\\lib\\site-packages\\sklearn\\utils\\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
My code:
def main():
data = []
folds = 10
ex = [ [] for x in range(0,10)]
results = []
for i,f in enumerate(sys.argv[1:]):
data.append(csv.DictReader(open(f,'r'),delimiter='\t'))
for f in data:
for i,datum in enumerate(f):
ex[i % folds].append(datum)
#print ex
for held_out in range(0,folds):
l = []
cor = []
l_test = []
cor_test = []
vec = []
vec_test = []
for i,fold in enumerate(ex):
for line in fold:
if i == held_out:
l_test.append(line['label'].rstrip("\n"))
cor_test.append(line['text'].rstrip("\n"))
else:
l.append(line['label'].rstrip("\n"))
cor.append(line['text'].rstrip("\n"))
vectorizer = CountVectorizer(ngram_range=(1,1),min_df=1)
X = vectorizer.fit_transform(cor)
for c in cor:
tmp = vectorizer.transform([c]).toarray()
vec.append(tmp[0])
for c in cor_test:
tmp = vectorizer.transform([c]).toarray()
vec_test.append(tmp[0])
clf = MultinomialNB()
clf .fit(vec,l)
result = accuracy(l_test,vec_test,clf)
print result
if __name__ == "__main__":
main()
Any idea which line raises this warning? Another issue is that running this code with different data sets gives me the same exact accuracy, and I can't figure out what causes this? If I want to use this model in another python process, I looked at the documentation and I found an example of using pickle library, but not for joblib. So, I tried following the same code, but this gave me errors:
clf = joblib.load('model.pkl')
pred = clf.predict(vec);
Also, if my data is CSV file with this format: "label \\t text \\n" what should be in the label column in test data?
Thanks in advance
Your 'vec' input into your clf.fit(vec,l).fit
needs to be of type [[]]
, not just []
. This is a quirk that I always forget when I fit models.
Just adding an extra set of square brackets should do the trick!
It's:
pred = clf.predict(vec);
I used this in my code and it worked:
#This makes it into a 2d array
temp = [2 ,70 ,90 ,1] #an instance
temp = np.array(temp).reshape((1, -1))
print(model.predict(temp))
2 solution: philosophy___make your data from 1D to 2D
Just add: []
vec = [vec]
Reshape your data
import numpy as np vec = np.array(vec).reshape(1, -1)
If you want to find out where the Warning
is coming from you can temporarly promote Warnings
to Exceptions
. This will give you a full Traceback and thus the lines where your program encountered the warning.
with warnings.catch_warnings():
warnings.simplefilter("error")
main()
If you run the program from the commandline you can also use the -W
flag. More information on Warning-handling can be found in the python documentation .
I know it is only one part of your question I answered but did you debug your code?
Since 1D array would be deprecated. Try passing 2D array as a parameter. This might help.
clf = joblib.load('model.pkl')
pred = clf.predict([vec]);
预测方法需要二维数组,你可以看这个视频,我也找到了准确的时间https://youtu.be/KjJ7WzEL-es?t=2602 。你必须从 [] 更改为 [[]]。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.