简体繁体中英

Compare documents by sequence vector

原文 2015-12-09 15:53:27 5 1 matlab/ vector/ nlp/ text-classification/ document-classification

I'm trying to classify documents by sequence vector. Basically, I have a vocabulary (more than 5000 words). Each document is converted to a vector of integer numbers so that each element in the vector corresponds the position of the word in the vocabulary.

For example, if the vocab is [hello, how, are, you, today] and the document is "hello you" then I'll have the vector: [1 4] .
Another document of "how are you" will result in [2 3 4] .

Now what I want is to assess the similarity between the first and the second vector. Here you can see these vectors don't have the same length. Furthermore, comparing directly them may not make sense because they represent sequence of words. This case is different from binary (bag-of-word) vector, which considers the appearance of a word in the document (1 if appear, otherwise 0), and also frequency (word count) vector, which considers frequency of a word in the document with the given vocabulary.
Can you give me a suggestion?

1 answers

The Jaccard similarity is normally used to compare the similarity of sets (in your case, text). The text is n-grammed (shingled), and then locality sensitive hashing is used to determine their Jaccard similarity.

There is a whole field dedicated to this - Google is your friend!

period in sequence of element in vector

MATLAB - generating vector with sequence of values

Counting frequency of a sequence within a vector

Split a vector according to a defined sequence

How to compare the elements in a vector with another vector

Compare all elements of a vector not in a loop

How to compare vector with a value in matlab?

Compare times between vector and matrix

Replace sequence of integers in vector by single number

Matlab - How to create a vector with power sequence

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question period in sequence of element in vector MATLAB - generating vector with sequence of values Counting frequency of a sequence within a vector Split a vector according to a defined sequence How to compare the elements in a vector with another vector Compare all elements of a vector not in a loop How to compare vector with a value in matlab? Compare times between vector and matrix Replace sequence of integers in vector by single number Matlab - How to create a vector with power sequence

Related Tags

Compare documents by sequence vector

Question

1 answers

solution1 1 2015-12-09 16:37:02

solution1
1 2015-12-09 16:37:02