[英]Select only matched columns in 2 different dataframes R
I have 106 columns in 1st DF and 97 in 2nd and i want to merge both of them.我在第一个 DF 中有 106 列,在第二个中有 97 列,我想合并它们。 For this i need to have identical columns in both DF's.
为此,我需要在两个 DF 中都有相同的列。
So how can i achieve below requirements(listed below).那么我如何才能达到以下要求(如下所列)。
DF1 :column names are A,B,C & D
DF2 :column names A,B & E.
Can select below combinations of columns in dataframes ?可以在数据框中选择以下列组合吗?
1) Match in both i.e A & B
2) Extras in 2nd i.e E
3) Extras in first i.e C & D
I tried different ways like select()
in dplyr with colnames(df1) == colnames(df2)
etc and other different possibilities but not getting any success.我尝试了不同的方法,例如 dplyr 中的
select()
与colnames(df1) == colnames(df2)
等和其他不同的可能性,但没有取得任何成功。
Below is Dataframe1 :以下是 Dataframe1 :
[1] "ï..Lan.ID" "NBFC" "Application.ID"
[4] "Region" "Loan.City" "Loan.Type"
[7] "Loan.Scheme" "Name" "Mobile.Number"
[10] "Loan.Status" "Principal.Outstanding" "Last.EMI"
[13] "Next.EMI" "Next.Bullet.Month" "Next.Bullet.Amount"
[16] "Sum.Instalment.Posted" "Dues.Receipts" "EMI.Due"
[19] "All.Dues" "Instalment.Dues" "Bullets.Overdue"
[22] "Loan.Quality" "Sanctioned.Amount" "Loan.Amount"
[25] "Tenure" "Completed.Tenure" "Tenure.Left"
[28] "Personal.Email" "Official.Email" "No..Of.Late.Payments"
[31] "CRIF.Score" "CIBIL.Score" "No.of.Actions"
[34] "Fixed.Income" "ECS.Customer.Name" "ECS.Bank.Name"
[37] "ECS.Account.Number" "Loan.Date" "Sanction.Month"
[40] "EMI.Start.Date" "X1st.EMI.Month" "End.Date"
[43] "Home.Address" "Permanent.Address" "Employer.Name"
[46] "Company.MCA.ID" "Business.Address" "Reference.Details"
[49] "Nature.of.Business" "Pan.Card" "Aadhar.UID"
[52] "Gender" "Educational.Qualification" "DOB"
[55] "Marital.Status" "Last.Payment.Date" "Job.Type"
[58] "Employment.Year" "Cycle.Date" "Age"
[61] "relevant_pos" "crif_active_accounts" "crif_overdue_amt"
[64] "crif_current_outstanding" "cibil_active_accounts" "cibil_overdue_amt"
[67] "cibil_current_outstanding" "NACH.Status" "Awarenss.Allocation"
[70] "Allocation.Date" "Awareness.Data" "Awareness.Brk.up"
[73] "Dec.19.EMI.Amount" "Tenure.End" "Dec.19.BKt"
[76] "DPD" "New.DPD" "DPD.Range.New"
[79] "New.Amount.Due" "New.Total.Due" "Loan.Slabs"
[82] "Last.Month.Bnc" "X1st.EMI" "Dec.19.Bnc"
[85] "Dec.19.Non.Starter" "Reason.of.Bnc" "HNI"
[88] "EMI.Due.1" "OS" "Advance.Paid"
[91] "Paid.Unpaid" "Not.Allocated" "Excess"
[94] "CC.Take.Over...OD" "Last.Month.delinq" "Loan.Status.1"
[97] "CIBIL.Bracket" "Salary.Bracket" "DPD.1"
[100] "Reason.of.Default" "Contactibility" "Delinq"
[103] "PayTm.Industry" "Industry" "Employer.Name.1"
[106] "DELINQ.NON.DELINQ"
Dataframe 2:数据框 2:
[1] "ï..Lan.ID" "NBFC" "Application.ID"
[4] "Region" "Loan.City" "Loan.Type"
[7] "Loan.Scheme" "Name" "Mobile.Number"
[10] "Loan.Status" "Principal.Outstanding" "Last.EMI"
[13] "Next.EMI" "Next.Bullet.Month" "Next.Bullet.Amount"
[16] "Sum.Instalment.Posted" "Dues.Receipts" "EMI.Due"
[19] "All.Dues" "Instalment.Dues" "Bullets.Overdue"
[22] "Loan.Quality" "Sanctioned.Amount" "Loan.Amount"
[25] "Tenure" "Completed.Tenure" "Tenure.Left"
[28] "Personal.Email" "Official.Email" "No..Of.Late.Payments"
[31] "CRIF.Score" "CIBIL.Score" "No.of.Actions"
[34] "Fixed.Income" "ECS.Customer.Name" "ECS.Bank.Name"
[37] "ECS.Account.Number" "Loan.Date" "Sanction.Month"
[40] "EMI.Start.Date" "X1st.EMI.Month" "End.Date"
[43] "Home.Details" "Permanent.Address.Details" "Employer.Name"
[46] "Company.MCA.ID" "Business.Details" "Reference.Details"
[49] "Nature.of.Business" "Pan.Card" "Aadhar.UID"
[52] "Gender" "Educational.Qualification" "DOB"
[55] "Marital.Status" "Last.Payment.Date" "Job.Type"
[58] "Employment.Year" "Cycle.Date" "Age"
[61] "relevant_pos" "crif_active_accounts" "crif_overdue_amt"
[64] "crif_current_outstanding" "cibil_active_accounts" "cibil_overdue_amt"
[67] "cibil_current_outstanding" "NACH.status" "Awarenss.Allocation"
[70] "Allocation.Date" "Awareness.Data" "Awareness.Brk.up"
[73] "June.19.EMI.Amount" "Tenure.End" "June.BKt"
[76] "Loan.Slabs" "Last.Month.Bnc" "X1st.EMI"
[79] "June.19.Bnc" "June.19.Non.Starter" "Reason.of.Bnc"
[82] "HNI" "EMI.Due.1" "OS"
[85] "Advance.Paid" "PAID.Unpaid" "Not.Allocated"
[88] "Excess" "DPD" "CC.Take.Over"
[91] "Last.Month.delinq" "Loan.Status.1" "CIBIL.Bracket"
[94] "Salary.Bracket" "DPD.1" "DELINQ.NON.DELINQ"
[97] "Month"
Expected outcome here would be names of matching columns & names of unmatched columns in both DF's.这里预计结果将在两个DF的匹配列和无与伦比的列名的名字。
I think Sotos's comment provide the most elegant output expected to your question.我认为 Sotos 的评论为您的问题提供了最优雅的输出。
However as an alternative, you can have the use of %in%
:但是,作为替代方案,您可以使用
%in%
:
O1 = colnames(dfA)[colnames(dfA) %in% colnames(dfB)]
> O1
[1] "A" "B" "C"
However, regarding your matching conditions 2) and 3), it's a little bit confusing because when you ask for:但是,关于您的匹配条件 2) 和 3),这有点令人困惑,因为当您要求时:
2) Common in both and additional in 2nd ie A,B & E
2) 在两者中通用,在 2nd 中是附加的,即 A、B 和 E
To my opinion, it correspond to all columns in the second dataset ( colnames(dfB)
)在我看来,它对应于第二个数据集中的所有列(
colnames(dfB)
)
3) Common in both and extras in first ie A,B,C & D
3) 常见于两者和附加项,即 A、B、C 和 D
And this correspond to all columns in the first dataset ( colnames(dfA)
)这对应于第一个数据集中的所有列(
colnames(dfA)
)
Does it makes sense to you ?这对你有意义吗? Did I missed something on your merging pattern ?
我是否遗漏了您的合并模式中的某些内容?
Data数据
dfA = data.frame(matrix(sample(1:100, 16), ncol = 4, nrow = 4))
colnames(dfA) = LETTERS[1:4]
dfB = data.frame(matrix(sample(1:100, 16), ncol = 4, nrow = 4))
colnames(dfB) = LETTERS[c(1:3,5)]
> dfA
A B C D
1 75 66 17 89
2 46 7 27 38
3 97 26 47 31
4 32 20 71 2
> dfB
A B C E
1 94 70 18 16
2 69 57 29 60
3 53 50 25 96
4 37 51 64 75
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.