如何从R中的名称列表中查找唯一的名称/字符

Question

I have a huge list of company names. 我有很多公司名称。 As illustrated below, if name company is ABBEYCREST.DEAD...10.10.14...ASK.PRICE, this means ABBEYCREST.DEAD...10.10.14... is name of company and ASK.PRICE is ASK Price data and when it ends with BID.PRICE is means its the BID PRICE data. 如下图所示，如果名称公司为ABBEYCREST.DEAD ... 10.10.14 ... ASK.PRICE，则表示ABBEYCREST.DEAD ... 10.10.14 ...为公司名称，而ASK.PRICE为ASK价格数据当以BID.PRICE结尾时，表示其BID PRICE数据。 I want to identify the company whose only one column name is avaiable in the dataframe. 我想识别在数据框中只有一个列名称可用的公司。 Actually I have a dataframe which has colum headers as illustrated below, implying each company should have 2 columns, if there are 4000 companies so there should be 8000 columns in my dataframe but I have 7999 ( although my dataframe has a date column but I exclude it when I count columns). 其实我有一个数据框，具有如下所示的列标题，这意味着每个公司应该有2列，如果有4000家公司，那么我的数据框中应该有8000列，但是我有7999个（尽管我的数据框有一个日期列，但我排除了当我计算列时）。

df<-AskBid

    ABBEYCREST.DEAD...10.10.14...ASK.PRICE
    ABBEYCREST.DEAD...10.10.14...BID.PRICE
    ABBOT.GROUP.DEAD...07.03.08...ASK.PRICE
    ABBOT.GROUP.DEAD...07.03.08...BID.PRICE
    ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...ASK.PRICE
    ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...BID.PRICE
    ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...ASK.PRICE
    ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...BID.PRICE
    ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...ASK.PRICE
    ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...BID.PRICE
    ABERTIS..IRS....BID.PRICE
    ABGENIX..IRS..DEAD...12.11.07...ASK.PRICE
    ABGENIX..IRS..DEAD...12.11.07...BID.PRICE
    ABLON.GROUP.DEAD...31.05.13...ASK.PRICE
    ABLON.GROUP.DEAD...31.05.13...BID.PRICE
    ACAMBIS.DEAD...25.09.08...ASK.PRICE
    ACAMBIS.DEAD...25.09.08...BID.PRICE

I want to find is 我想找到的是

missing <- df
ABERTIS..IRS....BID.PRICE

I would really appreciate your help. 我将衷心感谢您的帮助。 This is causing problems in my estimations. 这在我的估计中引起问题。

Answer 1

You can remove the ASK.PRICE and BID.PRICE part and call duplicated twice (the second time on the reversed order): 您可以删除ASK.PRICE和BID.PRICE部分，并duplicated调用两次（按相反的顺序第二次调用）：

cn <- readLines(textConnection(
"ABBEYCREST.DEAD...10.10.14...ASK.PRICE
ABBEYCREST.DEAD...10.10.14...BID.PRICE
ABBOT.GROUP.DEAD...07.03.08...ASK.PRICE
ABBOT.GROUP.DEAD...07.03.08...BID.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...BID.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...BID.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...ASK.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...BID.PRICE
ABERTIS..IRS....BID.PRICE
ABGENIX..IRS..DEAD...12.11.07...ASK.PRICE
ABGENIX..IRS..DEAD...12.11.07...BID.PRICE
ABLON.GROUP.DEAD...31.05.13...ASK.PRICE
ABLON.GROUP.DEAD...31.05.13...BID.PRICE
ACAMBIS.DEAD...25.09.08...ASK.PRICE
ACAMBIS.DEAD...25.09.08...BID.PRICE"))

## remove (ASK|BID).PRICE
cn.sub <- gsub("(ASK|BID)\\.PRICE$", "", cn)

cn[!(duplicated(cn.sub) | rev(duplicated(rev(cn.sub))))]
# [1] "ABERTIS..IRS....BID.PRICE"

Answer 2

Here is another solution assuming text is the column name in the data frame read in: 这是另一种解决方案，假设文本是读入的数据框中的列名：

library(dplyr)
df$text <- gsub(("(ASK|BID)", "", df$text)
df %>% group_by(text) %>% filter(n() != 2)

如何从R中的名称列表中查找唯一的名称/字符

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-01-03 18:31:58

解决方案2
0 2016-01-03 22:36:55

如何从R中的名称列表中查找唯一的名称/字符

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-01-03 18:31:58

解决方案2 0 2016-01-03 22:36:55

解决方案1
2 已采纳 2016-01-03 18:31:58

解决方案2
0 2016-01-03 22:36:55