简体   繁体   English

在foreach中使用%dopar%写入数据框

[英]Write to dataframe with %dopar% in foreach

I want to use a foreach each loop running with a doParallel backend for getting tweets from a MySQL database with the RMySql package. 我想使用一个带有doParallel后端的foreach每个循环,通过RMySql软件包从MySQL数据库获取推文。

I create a connection to the database for every user id I want to query, then I get every tweet from that user by 200 batches. 我为要查询的每个用户ID创建到数据库的连接,然后按200个批次从该用户获得每条推文。 If the batch size is 0 (so there are no further tweets) I query next user id. 如果批处理大小为0(因此没有进一步的推文),则查询下一个用户ID。

I want to store the information in a dataframe called tweets, which has columns for the number of hashtags in a tweet and a column with dates. 我想将信息存储在一个名为tweets的数据框中,该数据框包含有关tweet中的#标签数量的列以及带有日期的列。 For every tweet I want to find out how many hashtags it has and in which month it was created. 对于每条推文,我想知道它有多少个主题标签以及它在哪个月创建。 Then I want to increase the number in the dataframe by 1. 然后我想将数据框中的数字增加1。

So how can I write the results for every tweet in the dataframe? 那么,如何为数据框中的每个推文编写结果?

My dataframe in the beginning: 我的数据框架开始时:

| dates    | zero_ht | one_ht | two_ht | three_ht | four_ht | five_ht |
|----------|---------|--------|--------|----------|---------|---------|
| 01/01/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/02/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/03/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/04/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/05/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/06/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/07/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/08/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/09/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/10/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/11/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/12/13 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/01/14 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/02/14 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/03/14 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/04/14 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/05/14 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/06/14 | 0       | 0      | 0      | 0        | 0       | 0       |
| 01/07/14 | 0       | 0      | 0      | 0        | 0       | 0       |

My code: 我的代码:

x<- foreach(i=1:nrow(ids) ,.packages=c("DBI", "RMySQL"),.combine=rbind ) %dopar% {

con <- dbConnect(MySQL(), *CREDENTIALS*)

start <- 0

length <- 1
while(length > 0)
{
query <- *QUERY*
data <- dbGetQuery(con, query)

length <- nrow(data)

#print(paste("Starting at ",start,sep=""))

for(j in 1:length)
{   
    if(length==0)
    {

    }
    else{ 

    #get the number of hashtags used
    number <-   nchar((gsub("[^#]","",data$message[j])))

    #get the date the tweet was created
    date <- paste(format(as.Date(data$created_at[j]), "%Y-%m"),"-01",sep="")
    # just use it when there are less than 5 hashtags
    if(number < 5)
    {

        if(number==0)
        {


        tweets[tweets$dates==date,2] <- tweets[tweets$dates==date,2]+1


        }
        else{
            tweets[tweets$dates==date,number+1] <- tweets[tweets$dates==date,number+1]+1


        }

    }

}    
}
#increase the start by 200; to get the next 200 tweets
start <- start + 200

}
data.frame(date=date,number=number)
dbDisconnect(con) 
}

Thanks to the comments I could solve the problem: The reason for the list with just "TRUE"s in it, was that the last command in the foreach loop was 多亏了这些注释,我可以解决问题了:列表中仅包含“ TRUE”的原因是,foreach循环中的最后一条命令是

dbDisconnect(con) 

And when the database connection was closed successfully it returns a "TRUE". 成功关闭数据库连接后,它将返回“ TRUE”。

So I just had to swap the last two lines and make 所以我只需要交换最后两行并

data.frame(date=date,number=number)

and everything worked fine. 而且一切正常。

Regards 问候

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM