R中如何获取包含列表中值的行并创建计数数据框

Question

I have a dataframe that contain: 我有一个包含以下内容的数据框：

   Meal        Contents     
   Type_1      redberries,strawberry,blackberry
   Type_2      banana,apple,strawberry,
   Type_3      rice,chicken
   Type_4      beef,stringbeans,mashpotatoes
   Type_5      banana,strawberry,berry,cantaloupe

I created a vector representation of the Contents column and new df2 is 我创建了Contents列的矢量表示，新的df2是

 Meal           Contents                          Strawberry   Banana   Rice
   Type_1      redberries,strawberry,blackberry     1            0      0
   Type_2      banana,apple,strawberry,             1            1    
   Type_3      rice,chicken                         0            0
   Type_4      beef,stringbeans,mashpotatoes        0            0
   Type_5      banana,strawberry,berry,cantaloupe   1            1

I tried to get the top 2 contents based on the count of : 我试图根据计数获得前2个内容：

  top2_v1 <- c("strawberry","banana")

But I am stumped in trying to get back the frequency distribution of the count of Meal Types that contain the Top N contents??? 但是我很想尝试获取包含前N个含量的膳食类型计数的频率分布？？？

Can I run a loop using the top2_v1 in the df2 dataframe so I can create another dataframe that would let me know the frequency for each Top N contents? 我可以使用df2数据帧中的top2_v1运行循环，以便创建另一个数据帧，让我知道每个前N个内容的频率吗？

Answer 1

Try this (starting with df2): 试试看（从df2开始）：

df2

    Meal                           Contents apple banana beef berry blackberry cantaloupe chicken mashpotatoes redberries rice strawberry stringbeans
1 Type_1   redberries,strawberry,blackberry     0      0    0     0          1          0       0            0          1    0          1           0
2 Type_2           banana,apple,strawberry,     1      1    0     0          0          0       0            0          0    0          1           0
3 Type_3                       rice,chicken     0      0    0     0          0          0       1            0          0    1          0           0
4 Type_4      beef,stringbeans,mashpotatoes     0      0    1     0          0          0       0            1          0    0          0           1
5 Type_5 banana,strawberry,berry,cantaloupe     0      1    0     1          0          1       0            0          0    0          1           0

n <- 2
topn_v1  <- names(sort(colSums(df2[3:ncol(df2)]), decreasing=TRUE))[1:n]
indices <- apply(df2, 1, function(x) any(as.integer(as.character(x[topn_v1]))))

df2[indices,] # Meals that contain at least one of the top_n Contents
    Meal                           Contents apple banana beef berry blackberry cantaloupe chicken mashpotatoes redberries rice strawberry stringbeans
1 Type_1   redberries,strawberry,blackberry     0      0    0     0          1          0       0            0          1    0          1           0
2 Type_2           banana,apple,strawberry,     1      1    0     0          0          0       0            0          0    0          1           0
5 Type_5 banana,strawberry,berry,cantaloupe     0      1    0     1          0          1       0            0          0    0          1           0

table(df2[indices,]$Meal)   

Type_1 Type_2 Type_3 Type_4 Type_5 
 1      1      0      0      1 

table(df2[indices,]$Meal) / nrow(df[indices,]) # in proportion

   Type_1    Type_2    Type_3    Type_4    Type_5 
0.3333333 0.3333333 0.0000000 0.0000000 0.3333333

Answer 2

Try this: 尝试这个：

 n <- 2
 topn_v1  <- names(sort(colSums(df2[3:ncol(df2)]), decreasing=TRUE))[1:n]
 indices <- apply(df2, 1, function(x) any(as.integer(as.character(x[topn_v1]))))
 table(df2[indices,]$Meal)
 table(df2[indices,]$Meal) / nrow(df[indices,])
 barplot(sort(table(df2[indices,]$Meal) / nrow(df[indices,]), decreasing = TRUE), 
                                                              ylab='Proportions')

R中如何获取包含列表中值的行并创建计数数据框

问题描述

2 个解决方案

解决方案1
0 已采纳 2016-09-27 06:31:54

解决方案2
0 2016-09-28 05:03:02

R中如何获取包含列表中值的行并创建计数数据框

问题描述

2 个解决方案

解决方案1 0 已采纳 2016-09-27 06:31:54

解决方案2 0 2016-09-28 05:03:02

解决方案1
0 已采纳 2016-09-27 06:31:54

解决方案2
0 2016-09-28 05:03:02