简体   繁体   中英

Creating a document term matrix in R

I need to create a documenttermmatrix for myself, my twitter followers and their followers.

We need to create this without using the tm package.

at the moment, we have the following variables:

list l : containing all the followers' followers, stored per follower (including myself and my own followers)

lunique1 : an unlisted and sorted version of list l, it contains all the followers' followers

matrix : a matrix that we created with the following dimensions:

matrix <- matrix(, nrow=length(followers)+1, ncol = length(lunique1))

followers : a list containing all my followers. (the plus one in nrow = length(followers)+1 is needed to include myself into the dimensions

This is my code for creating the documentTermMatrix (a matrix only containing the values zero and one in order to show who is linked to who)

    for(x in 1 : length(followers)+1)
{
  for(y in 1:length(l[x]))
  {
    for(z in lunique1)
    {

      if(lunique1[z] == l[[x]][y]) 
      {
        matrix[y][z] = 1
      }
      else
        matrix[y][z] = 0

    }}}

I am not (yet) expirienced in R but this code needs to work before tonight. I hope you guys can help me out, because i'm really out of ideas :(

thanks in advance

With the R-package tm , you have the option to create a DocumentTermMatrix

This approach should be more convenient than your loop-construction.

There is a way to create document term matrix without tm package, this link below has a procedure. You can use similar approach This is the link

we have solved the question ourselves with the following code

 lunique <- unique(unlist(l))
lunique1 <- sort(lunique)
matrix <- matrix(, nrow=length(followers)+1, ncol = length(lunique))
n = 1
m = 1
for(n in 1:length(l))
{
for(m in 1:length(l[[n]]))
{
h <- grep(l[[n]][m], lunique1)
if (length(h>0))
{
matrix[n,h]=1
} else {
matrix[n,h]=0
}
h <- c()
} 
}
matrix <- replace(matrix, is.na(matrix), 0)
adjacency <- t(matrix)%*%matrix

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM