简体   繁体   English

如何使用卷积神经网络 (python) 构建深度学习文本分类器

[英]How to build a deep learning text classifier using convolutional neural networks (python)

What are the steps I would need to take to build a deep learning text classifier, more specifically a text classifier that identifies an author (authorship attribution) in a set of unlabeled texts?我需要采取哪些步骤来构建深度学习文本分类器,更具体地说是在一组未标记的文本中识别作者(作者归属)的文本分类器? The model I am looking at using is word-word CNN (convolutional neural network) which has proven to be very successful in things such as text classification.我正在考虑使用的模型是 word-word CNN(卷积神经网络),它已被证明在文本分类等方面非常成功。 I am looking to build this model in python.我正在寻找在 python 中构建这个模型。

I am new to deep learning so any resources and information is appreciated.我是深度学习的新手,因此感谢任何资源和信息。

An End to End Example to demonstrate how to use Convolutional Neural Networks for Text Classification using Tensorflow.Keras is shown below:一个端到端的例子来演示如何使用Tensorflow.Keras使用Convolutional Neural Networks进行Text Classification如下所示:

from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
max_features = 10000
max_len = 500
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

model = Sequential()
model.add(layers.Embedding(max_features, 128, input_length=max_len))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))
model.summary()
model.compile(optimizer=RMSprop(lr=1e-4),
loss='binary_crossentropy',
metrics=['acc'])
history = model.fit(x_train,y_train,epochs=10,batch_size=128,validation_split=0.2)

For more information please refer Section 6.4 Sequence processing with convnets in the book, Deep Learning Using Python written by Francois Chollet, the Father of Keras.有关更多信息,请参阅Keras之父 Francois Chollet 编写的《 Deep Learning Using Python 》一书中的第 6.4 节使用 convnets进行序列处理

Hope this helps.希望这可以帮助。 Happy Learning!快乐学习!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM