简体   繁体   English

对具有不同条件的五个或更多列进行子集化 R data.table

[英]Subsetting five or more columns with different conditions R data.table

I have a data.table that looks like this:我有一个看起来像这样的 data.table:

 COUNTRY   GENDER     CURRENCY    INCOME_GROUP    YEAR  
 FRANCE     MAN       EURO            HIGH        2014  
 GERMANY    WOMEN     EURO            LOW         2015  
 FINLAND    MAN       EURO            LOW         2016  
 JAPAN      MAN       YEN             HIGH        2017  
 USA        WOMEN     DOLLAR          LOW         2018  

I want to subset this table with this code: datanew <- data[data$YEAR == "2014"& data$CURRENCY == "DOLLAR" & data$COUNTRY == FRANCE & data$INCOME_GROUP == LOW] but whenever I add three or more condition datanew variable always has "0" observation.我想用以下代码对该表进行子集化: datanew <- data[data$YEAR == "2014"& data$CURRENCY == "DOLLAR" & data$COUNTRY == FRANCE & data$INCOME_GROUP == LOW]但每当我添加三个或更多条件datanew变量始终具有“0”观察值。 I mean I can not add 4 or more conditions.我的意思是我不能添加 4 个或更多条件。 Is there any way to solve this problem?有没有办法解决这个问题? Thanks for your help.谢谢你的帮助。

I'm presuming you are wanting to subset rows which fulfil the criteria you gave?我假设您想要对满足您给出的标准的行进行子集化? In that case, there aren't any rows which satisfy your criteria.在这种情况下,没有任何行满足您的条件。

If you try:如果你试试:

data = fread('COUNTRY   GENDER     CURRENCY    INCOME_GROUP    YEAR  
 FRANCE     MAN       EURO            HIGH        2014  
 GERMANY    WOMEN     EURO            LOW         2015  
 FINLAND    MAN       EURO            LOW         2016  
 JAPAN      MAN       YEN             HIGH        2017  
 USA        WOMEN     DOLLAR          LOW         2018  
')

data[YEAR == "2014" & CURRENCY == "EURO" & COUNTRY == "FRANCE" & INCOME_GROUP == "HIGH"]

returns:返回:

   COUNTRY GENDER CURRENCY INCOME_GROUP YEAR
1:  FRANCE    MAN     EURO         HIGH 2014

Also, you need to wrap quotes around FRANCE and LOW in your statement, and since it's a data.table , you don't need to use the dollar sign for identifying the columns.此外,您需要在语句中为FRANCELOW加上引号,并且由于它是data.table ,因此您不需要使用美元符号来标识列。

Your code doesn't replicate the error, running your code the error is:您的代码不会复制错误,运行您的代码错误是:

Error in `[.data.frame`(data, data$YEAR == "2014" & data$CURRENCY == "DOLLAR" &  : 

object 'FRANCE' not found object 'FRANCE' 未找到

This is because you are trying to call a variable called FRANCE (and another called LOW) when you should be passing a character vector, like you do with "DOLLAR" :这是因为当你应该传递一个字符向量时,你试图调用一个名为 FRANCE 的变量(以及另一个名为 LOW)的变量,就像你对"DOLLAR"所做的那样:

datanew <- data[data$YEAR == "2014"& data$CURRENCY == "DOLLAR" & data$COUNTRY == "FRANCE" & data$INCOME_GROUP == "LOW"]

This replicates your problem, data frame with 0 columns and 5 rows which is just that you have no rows that satisfy all conditions - you subset to no data.这复制了您的问题, data frame with 0 columns and 5 rows这只是您没有满足所有条件的行 - 您没有数据的子集。 You can have as many conditions as you like, but you need data that satisfy them.您可以拥有任意多个条件,但您需要满足这些条件的数据。 The following returns one row:以下返回一行:

data[data$YEAR == "2014"& data$CURRENCY == "EURO" & data$COUNTRY == "FRANCE" & data$INCOME_GROUP == "HIGH"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM