R：根据另一列条目对df进行设置

Question

我假设我正在处理一个关于基于另一个列值（这里是患者）的子集数据的简单问题，但是找不到解决方案。 我需要为至少去过医院4次的每个患者提供一个数据子集。 换句话说，只有在医院里至少住过4次的患者才能在新的df中显示他们的4个就诊行。 我的桌子看起来像这样：

 </style> <table class="tg"> <tr> <th class="tg-yw4l">Patient</th> <th class="tg-yw4l"># Hospital Visits</th> <th class="tg-yw4l">Duration</th> </tr> <tr> <td class="tg-yw4l">Monica</td> <td class="tg-yw4l">1</td> <td class="tg-yw4l">10D</td> </tr> <tr> <td class="tg-yw4l">Jack</td> <td class="tg-yw4l">1</td> <td class="tg-yw4l">5D</td> </tr> <tr> <td class="tg-yw4l">Monica</td> <td class="tg-yw4l">2</td> <td class="tg-yw4l">3D</td> </tr> <tr> <td class="tg-yw4l">Eric</td> <td class="tg-yw4l">1</td> <td class="tg-yw4l">2D</td> </tr> <tr> <td class="tg-yw4l">Eric</td> <td class="tg-yw4l">2</td> <td class="tg-yw4l">3D</td> </tr> <tr> <td class="tg-yw4l">Monica</td> <td class="tg-yw4l">3</td> <td class="tg-yw4l">4D</td> </tr> <tr> <td class="tg-yw4l">Jack</td> <td class="tg-yw4l">2</td> <td class="tg-yw4l">4D</td> </tr> <tr> <td class="tg-yw4l">Eric</td> <td class="tg-yw4l">3</td> <td class="tg-yw4l">8D</td> </tr> <tr> <td class="tg-yw4l">Eric</td> <td class="tg-yw4l">4</td> <td class="tg-yw4l">9D</td> </tr> </table>

非常感谢你！

Answer 1

df1 <- readHTMLTable(doc)[[1]]
colnames( df1 ) <- gsub("# ", '', colnames( df1 ))
df1$`Hospital Visits` <- as.numeric( df1$`Hospital Visits`)

df1
#   Patient Hospital Visits Duration
# 1  Monica               1      10D
# 2    Jack               1       5D
# 3  Monica               2       3D
# 4    Eric               1       2D
# 5    Eric               2       3D
# 6  Monica               3       4D
# 7    Jack               2       4D
# 8    Eric               3       8D
# 9    Eric               4       9D

仅获取患者至少去过4次医院的事件

with( df1, df1[ `Hospital Visits` >= 4, ] )
#   Patient  Hospital Visits Duration
# 9    Eric                4       9D

获取患者至少访问过4次的所有事件

do.call( 'rbind', lapply( split( df1, df1$Patient ), 
                          function( x ) if( any(x$'Hospital Visits' >= 4 ) ) { x }) )

#        Patient Hospital Visits Duration
# Eric.4    Eric               1       2D
# Eric.5    Eric               2       3D
# Eric.8    Eric               3       8D
# Eric.9    Eric               4       9D

数据：

library(XML)
doc <- htmlParse('<table class="tg">
                 <tr>
                 <th class="tg-yw4l">Patient</th>
                 <th class="tg-yw4l"># Hospital Visits</th>
                 <th class="tg-yw4l">Duration</th>
                 </tr>
                 <tr>
                 <td class="tg-yw4l">Monica</td>
                 <td class="tg-yw4l">1</td>
                 <td class="tg-yw4l">10D</td>
                 </tr>
                 <tr>
                 <td class="tg-yw4l">Jack</td>
                 <td class="tg-yw4l">1</td>
                 <td class="tg-yw4l">5D</td>
                 </tr>
                 <tr>
                 <td class="tg-yw4l">Monica</td>
                 <td class="tg-yw4l">2</td>
                 <td class="tg-yw4l">3D</td>
                 </tr>
                 <tr>
                 <td class="tg-yw4l">Eric</td>
                 <td class="tg-yw4l">1</td>
                 <td class="tg-yw4l">2D</td>
                 </tr>
                 <tr>
                 <td class="tg-yw4l">Eric</td>
                 <td class="tg-yw4l">2</td>
                 <td class="tg-yw4l">3D</td>
                 </tr>
                 <tr>
                 <td class="tg-yw4l">Monica</td>
                 <td class="tg-yw4l">3</td>
                 <td class="tg-yw4l">4D</td>
                 </tr>
                 <tr>
                 <td class="tg-yw4l">Jack</td>
                 <td class="tg-yw4l">2</td>
                 <td class="tg-yw4l">4D</td>
                 </tr>
                 <tr>
                 <td class="tg-yw4l">Eric</td>
                 <td class="tg-yw4l">3</td>
                 <td class="tg-yw4l">8D</td>
                 </tr>
                 <tr>
                 <td class="tg-yw4l">Eric</td>
                 <td class="tg-yw4l">4</td>
                 <td class="tg-yw4l">9D</td>
                 </tr>
                 </table>')

Answer 2

一种无数的方法，尽管不是最有效的一种。

假设您在数据框中有此标签，则可以过滤出4个或更多的ID（在本例中为名称）。 然后显示这些名称的所有记录。 我正在命名您的原始数据框my_df

who_to_include <- subset(unique(my_df$name),hospital_visits>=4)
library(dplyr)
4_or_more <- inner_join(who_to_include,my_df)

抱歉，这里没有示例，所以我只是在此处介绍这段代码，可能不是100％正确，或者可能是

Answer 3

假设您在数据框中有此对象，并且“患者”列的内容唯一地指定了一个患者（即没有多个Erics），则也可以仅使用基数R对其进行子集化：

# Find row numbers of entries with number of visits >= 4
frequentPatientRows <- patientsDf[, "# Hospital Visits"] >= 4
# Extract names from those rows
frequentPatientNames <- patientsDf[frequentPatientRows, "Name"]
# Select all entries for patients with those names
selectedPatients <- patientsDf[patientsDf[, "Name"] %in% frequentPatientNames, ]

R：根据另一列条目对df进行设置

问题描述

3 个解决方案

解决方案1
1 2017-03-15 22:29:33

解决方案2
0 2017-03-15 22:25:43

解决方案3
0 2017-03-15 22:43:43

R：根据另一列条目对df进行设置

问题描述

3 个解决方案

解决方案1 1 2017-03-15 22:29:33

解决方案2 0 2017-03-15 22:25:43

解决方案3 0 2017-03-15 22:43:43

解决方案1
1 2017-03-15 22:29:33

解决方案2
0 2017-03-15 22:25:43

解决方案3
0 2017-03-15 22:43:43