[英]R: Select Rows from Data Frame based on condition
I have the following data frames 我有以下数据框
User_Details: 用户详细信息:
+-------------+-----------+-----------+
| Name | Address | Phone |
+-------------+-----------+-----------+
| John Doe | Somewhere | 123456789 |
| Jane Doe | Somewhere | 234567891 |
| Jack Russel | Somewhere | 234567891 |
+-------------+-----------+-----------+
User_Transaction_Count: User_Transaction_Count:
+-------------+-----------+
| Name | Frequency |
+-------------+-----------+
| John Doe | 2 |
| Jane Doe | 5 |
| Jack Russel | 2 |
+-------------+-----------+
What I want to do is get the details of the user with the most transactions. 我要做的是获取交易最多的用户的详细信息。 So in the above case, Jane Doe has the most transactions, so I need to fetch her details into a data frame.
因此,在上述情况下,Jane Doe的交易最多,因此我需要将其详细信息提取到数据框中。
I tried the following code: 我尝试了以下代码:
User_details[which(user_details$Name = User_Transaction_Count[(which.max(User_Transaction_Count$Frequency)),]$Name)]
But I get this error: 但是我得到这个错误:
Error: unexpected '=' in "ad_maxState <- accidental_deaths[which(accidental_deaths$State ="
我对T.Ciffréo的答案进行了一些更改,并找到了解决方案:
User_details[User_details$Name==as.character(User_transaction_Count[which.max(User_transaction_Count$Frequency),]$Name),]
To determine the user with the maximum Frequency, we can use: 要确定具有最高频率的用户,我们可以使用:
with(User_Transaction_Count,Name[[which.max(Frequency)]])
However, if the User
column is using the factor()
datatype (which is usually the default), we need to convert it to a string to be used for lookup. 但是,如果“
User
列使用factor()
数据类型(通常是默认值),则需要将其转换为用于查找的字符串。 Otherwise internal value for "John Doe" in one data.frame
may not be the same as "John Doe" in the other. 否则,一个
data.frame
“ John Doe”的内部值可能与另一个data.frame
中的“ John Doe”不同。
maxUser <- as.character(with(User_Transaction_Count,Name[[which.max(Frequency)]]))
Then we can perform the lookup in the other data.frame
. 然后,我们可以在另一个
data.frame
执行查找。
result <- User_Details[User_Details$Name == maxUser,]
This may take a long time if the table is very large, so it may be best to create an index for this 如果表很大,可能会花费很长时间,因此最好为此创建一个索引
#build index
library(hash)
userIdx <- hash(as.character(User_Details$Name),1:nrow(User_Details))
#use index
maxUser <- as.character(with(User_Transaction_Count,Name[[which.max(Frequency)]]))
result <- User_Details[userIdx[[maxUser]],]
Output: 输出:
> result
Name Address Phone
2 Jane Doe Somewhere 234567891
码:
User_details[User_details$Name==User_transaction_Count[max(User_transaction_Count$Frequency),]$Name,]$Name
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.