简体   繁体   English

如何根据R中另一个dataframe中的列删除列dataframe中的行?

[英]How to delete rows in a column dataframe based on the column in another dataframe in R?

Let's suppose I have two dataframes that look like this:假设我有两个如下所示的数据框:

df1 = structure(list(X1 = c(0.659588465514883, 0.47368422669833, -0.0422047052887636, 
-1.75642936005977, 0.339813114272074, 1.09341750942405, 0.327672990051479, 
-0.893507823167616, -0.661285321563594, -0.569673784617002, -0.983369868281376, 
-2.53659592825309, 0.396220995581641, -1.1994504350227, -0.553343957714012, 
1.30884516680972, -0.120561033997931, 0.971506981390537, 0.815610612704566, 
1.53103368033727, -0.808956975392184, -1.27332589061096, -1.89082047917723, 
0.249755375966669, -0.704051599213331), X2 = c(0.659588465514883, 
0.47368422669833, -0.0422047052887636, -1.75642936005977, 0.339813114272074, 
1.09341750942405, 0.327672990051479, -0.893507823167616, -0.661285321563594, 
-0.569673784617002, -0.983369868281376, -2.53659592825309, 0.396220995581641, 
-1.1994504350227, -0.553343957714012, 1.30884516680972, -0.120561033997931, 
0.971506981390537, 0.815610612704566, 1.53103368033727, -0.808956975392184, 
-1.27332589061096, -1.89082047917723, 0.249755375966669, -0.704051599213331
), Date = structure(c(10957, 
10988, 11017, 11048, 11078, 11109, 11139, 11170, 11201, 11231, 
11262, 11292, 11323, 11354, 11382, 11413, 11443, 11474, 11504, 
11535, 11566, 11596, 11627, 11657, 11688), class = "Date")), class = "data.frame", row.names = c(NA, 
-25L))

            X1           X2
1  -1.633636896 -1.633636896
2   1.793766808  1.793766808
3   0.440697771  0.440697771
4   0.330091148  0.330091148
5  -1.234246285 -1.234246285
6   0.044951993  0.044951993
7  -2.831295687 -2.831295687
8  -0.735371579 -0.735371579
9  -0.412580789 -0.412580789
10  0.001848622  0.001848622
11  1.480684731  1.480684731
12 -1.088999830 -1.088999830
13 -0.465903929 -0.465903929
14 -0.010743010 -0.010743010
15  1.420995930  1.420995930
16 -0.789190729 -0.789190729
17 -0.750476176 -0.750476176
18 -0.314079067 -0.314079067
19 -0.324779959 -0.324779959
20 -1.192471909 -1.192471909
21 -0.170325813 -0.170325813
22  0.890941125  0.890941125
23  0.863875448  0.863875448
24 -0.088048086 -0.088048086
25  0.021239226  0.021239226
    Date
1   2000-01-01
2   2000-02-01
3   2000-03-01
4   2000-04-01
5   2000-05-01
6   2000-06-01
7   2000-07-01
8   2000-08-01
9   2000-09-01
10  2000-10-01
11  2000-11-01
12  2000-12-01
13  2001-01-01
14  2001-02-01
15  2001-03-01
16  2001-04-01
17  2001-05-01
18  2001-06-01
19  2001-07-01
20  2001-08-01
21  2001-09-01
22  2001-10-01
23  2001-11-01
24  2001-12-01
25  2002-01-01

df2 = structure(list(X1 = c(-0.0712460200169048, 1.0131741924359, 0.28590272354409, 
-0.835911047943257, -0.146890264431744), X2 = c(-0.0712460200169048, 
1.0131741924359, 0.28590272354409, -0.835911047943257, -0.146890264431744
), Date = structure(c(10984, 11120, 11441, 11488, 11712), class = "Date")), class = "data.frame", row.names = c(NA, 
-5L))

           X1          X2       Date
1  0.03815189  0.03815189 2000-01-28
2 -0.22665838 -0.22665838 2000-06-12
3  0.36459588  0.36459588 2001-04-29
4  0.32772746  0.32772746 2001-06-15
5 -1.22891784 -1.22891784 2002-01-25

What I would like to do is to reduce the number of rows in df1 (number of rows in df1 = number of rows in df2 ) on the basis of the the number of rows in df2 .我想做的是根据df2中的行数减少df1中的行数( df1中的行数 = df2中的行数)。 In particular, I would like to remove those rows that are in the Date column for df1 is not present in the Date column of df2 .特别是,我想删除df1的 Date 列中不存在于df2的 Date 列中的那些行。 Easier to see the output I would like to get:更容易看到 output 我想得到:


# DF1 shall become like this (n stays for the numbers corresponding to each date row):

           X1          X2       Date
1  n                    n 2000-01-01
2  n                    n 2000-06-01
3  n                    n 2001-04-01
4  n                    n 2001-06-01
5  n                    n 2002-01-01

# not really important which day is diplayed in the finale output. What matters is just year and month

I tried to use semin_join but the problem is that different days make the function unable to grasp what I need.我尝试使用semin_join但问题是不同的日子使 function 无法掌握我需要的东西。 Ideally, I would need to ignore days and sample by year and months.理想情况下,我需要忽略天数并按年和月进行抽样。

This is what I tried:这是我尝试过的:

library(dplyr)

semin_join(df1, df2, by = "Date")

[1] X1   X2   Date
<0 rows> (or 0-length row.names)

Can anyone help me?谁能帮我?

Thanks!谢谢!

Using the great suggestion from @arg0naut91 here a possible solution in base R .使用来自@arg0naut91 的伟大建议,这里是base R中的一个可能解决方案。 First format the variables Date and then you can use %in% to check which dates are present or not.首先格式化变量Date ,然后您可以使用%in%检查哪些日期存在或不存在。 Next the code using your df1 and df2 :接下来使用您的df1df2代码:

#Format dates
df1$I1 <- format(df1$Date,'%Y-%m')
df2$I2 <- format(df2$Date,'%Y-%m') 

Now this makes the contrast:现在形成对比:

df1[df1$I1 %in% df2$I2,]

Output: Output:

           X1         X2       Date      I1
1   0.6595885  0.6595885 2000-01-01 2000-01
6   1.0934175  1.0934175 2000-06-01 2000-06
16  1.3088452  1.3088452 2001-04-01 2001-04
18  0.9715070  0.9715070 2001-06-01 2001-06
25 -0.7040516 -0.7040516 2002-01-01 2002-01

In the end you could assign that result to a new dataframe and remove I1 .最后,您可以将该结果分配给新的 dataframe 并删除I1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 R 中的列值从 dataframe 中删除行 - Delete rows from dataframe based on column value in R 如何消除另一个数据框中基于列名的数据框中的行 - How to eliminate rows in a dataframe based column names in another dataframe 在 R 中创建基于 dataframe 中的另一列的列 - creating a column that based on another column in dataframe in R 在一个 dataframe 中创建一个列,基于另一个 dataframe 在 R 中的另一列 - Create a column in one dataframe based on another column in another dataframe in R 根据r中另一个数据框中的列填充数据框中的列 - Filling a column in a dataframe based on a column in another dataframe in r 如何基于一个数据框中的列的值和R中另一个数据框的列标题名称有条件地创建新列 - how to conditionally create new column based on the values of a column in one dataframe and the column header names of another dataframe in R 如何根据 R 中超过 1 列的条件对 dataframe 行重新排序? - How to reorder dataframe rows in based on conditions in more than 1 column in R? 根据 R 中的列值,基于现有 dataframe 创建另一个 dataframe - Create another dataframe based on an existing dataframe based on a column value in R 如果语句基于数据框内的另一列:R中 - If statements based on another column within a dataframe: in R 如何通过根据另一个数据帧的行名的顺序映射一个数据帧的列名来对 R 中的数据帧进行排序? - How to sort a dataframe in R by mapping column names of one dataframe based on the order of row names of another dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM