简体   繁体   English

如何在 r 数据框中应用模糊查找

[英]How do apply fuzzy lookup in r data frames

I have 2 data frames.我有 2 个数据框。 1. df1 is having sales data with unstructured headers, coming from OLAP cube. 1. df1 有来自 OLAP 多维数据集的非结构化标题的销售数据。

df1 <- data.frame("[Time].[Fiscal Year].[Fiscal Year].[MEMBER_CAPTION]"= c("FY18","FY19","FY20"), "[Measures].[USD]"=c(100,200,300))
names(df1) <- c("[Time].[Fiscal Year].[Fiscal Year].[MEMBER_CAPTION]","[Measures].[USD]")
  1. df2 is having list of unstructured headers and respective cleansed headers. df2 具有非结构化标题列表和相应的清理标题。
df2<- data.frame("RawHeaderName"=c("[Time].[Fiscal Year]","[Measures].[USD]"),"ReportDisplayName"=c("FiscalYear","USD"))

my requirement is when df2$RawHeaderName value matches (fuzzy matches) with df1 headers then i need to replace df1 headers with df2$ReportDisplayName value.我的要求是当 df2$RawHeaderName 值与 df1 标头匹配(模糊匹配)时,我需要用 df2$ReportDisplayName 值替换 df1 标头。 Final out should be like below.最终结果应该如下所示。

FinalOutput <- data.frame("FiscalYear" =c("FY18","FY19","FY20"),"USD"=c(100,200,300))

Please help me to solve the problem.请帮我解决问题。 I already tried with library("fuzzyjoin"),library("dplyr") libraries but no luck.我已经尝试过 library("fuzzyjoin"),library("dplyr") 库,但没有运气。

I think you're simply looking for names(df1) <- c('Fiscal Year', 'USD') which modifies df1 to:我认为您只是在寻找将df1修改为的names(df1) <- c('Fiscal Year', 'USD')

  Fiscal Year USD
1        FY18 100
2        FY19 200
3        FY20 300

After speeding some time, below code is helping me to solve 50% problem only when match criteria exists.加速一段时间后,以下代码仅在匹配条件存在时帮助我解决 50% 的问题。 Still need to explore on fuzzy match.在模糊匹配上还需要探索。

library("dplyr")图书馆(“dplyr”)

df1 <- data.frame("[Time].[Fiscal Year].[Fiscal Year].[MEMBER_CAPTION]"= c("FY18","FY19","FY20"), "[Measures].[USD]"=c(100,200,300))
names(df1) <- c("[Time].[Fiscal Year].[Fiscal Year].[MEMBER_CAPTION]","[Measures].[USD]")


df2<- data.frame("RawHeaderName"=c("[Time].[Fiscal Year].[Fiscal Year].[MEMBER_CAPTION]","[Measures].[USD]"),"ReportDisplayName"=c("FiscalYear","USD"))


Extract_Headers <- (names(df1))
Extract_Headers <- data.frame("Headers"=as.character(Extract_Headers))
df2$RawHeaderName <- as.character(df2$RawHeaderName)
df2$ReportDisplayName <- as.character(df2$ReportDisplayName)
Cleansed_Headers <- Extract_Headers %>% inner_join (df2, by =c("Headers"="RawHeaderName"))
names(df1)<- Cleansed_Headers$ReportDisplay

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM