简体   繁体   English

如何从R中的名称列表中查找唯一的名称/字符

[英]How to find unique name/character from a list of names in R

I have a huge list of company names. 我有很多公司名称。 As illustrated below, if name company is ABBEYCREST.DEAD...10.10.14...ASK.PRICE, this means ABBEYCREST.DEAD...10.10.14... is name of company and ASK.PRICE is ASK Price data and when it ends with BID.PRICE is means its the BID PRICE data. 如下图所示,如果名称公司为ABBEYCREST.DEAD ... 10.10.14 ... ASK.PRICE,则表示ABBEYCREST.DEAD ... 10.10.14 ...为公司名称,而ASK.PRICE为ASK价格数据当以BID.PRICE结尾时,表示其BID PRICE数据。 I want to identify the company whose only one column name is avaiable in the dataframe. 我想识别在数据框中只有一个列名称可用的公司。 Actually I have a dataframe which has colum headers as illustrated below, implying each company should have 2 columns, if there are 4000 companies so there should be 8000 columns in my dataframe but I have 7999 ( although my dataframe has a date column but I exclude it when I count columns). 其实我有一个数据框,具有如下所示的列标题,这意味着每个公司应该有2列,如果有4000家公司,那么我的数据框中应该有8000列,但是我有7999个(尽管我的数据框有一个日期列,但我排除了当我计算列时)。

df<-AskBid

    ABBEYCREST.DEAD...10.10.14...ASK.PRICE
    ABBEYCREST.DEAD...10.10.14...BID.PRICE
    ABBOT.GROUP.DEAD...07.03.08...ASK.PRICE
    ABBOT.GROUP.DEAD...07.03.08...BID.PRICE
    ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...ASK.PRICE
    ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...BID.PRICE
    ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...ASK.PRICE
    ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...BID.PRICE
    ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...ASK.PRICE
    ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...BID.PRICE
    ABERTIS..IRS....BID.PRICE
    ABGENIX..IRS..DEAD...12.11.07...ASK.PRICE
    ABGENIX..IRS..DEAD...12.11.07...BID.PRICE
    ABLON.GROUP.DEAD...31.05.13...ASK.PRICE
    ABLON.GROUP.DEAD...31.05.13...BID.PRICE
    ACAMBIS.DEAD...25.09.08...ASK.PRICE
    ACAMBIS.DEAD...25.09.08...BID.PRICE

I want to find is 我想找到的是

missing <- df
ABERTIS..IRS....BID.PRICE

I would really appreciate your help. 我将衷心感谢您的帮助。 This is causing problems in my estimations. 这在我的估计中引起问题。

You can remove the ASK.PRICE and BID.PRICE part and call duplicated twice (the second time on the reversed order): 您可以删除ASK.PRICEBID.PRICE部分,并duplicated调用两次(按相反的顺序第二次调用):

cn <- readLines(textConnection(
"ABBEYCREST.DEAD...10.10.14...ASK.PRICE
ABBEYCREST.DEAD...10.10.14...BID.PRICE
ABBOT.GROUP.DEAD...07.03.08...ASK.PRICE
ABBOT.GROUP.DEAD...07.03.08...BID.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...BID.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...BID.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...ASK.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...BID.PRICE
ABERTIS..IRS....BID.PRICE
ABGENIX..IRS..DEAD...12.11.07...ASK.PRICE
ABGENIX..IRS..DEAD...12.11.07...BID.PRICE
ABLON.GROUP.DEAD...31.05.13...ASK.PRICE
ABLON.GROUP.DEAD...31.05.13...BID.PRICE
ACAMBIS.DEAD...25.09.08...ASK.PRICE
ACAMBIS.DEAD...25.09.08...BID.PRICE"))

## remove (ASK|BID).PRICE
cn.sub <- gsub("(ASK|BID)\\.PRICE$", "", cn)

cn[!(duplicated(cn.sub) | rev(duplicated(rev(cn.sub))))]
# [1] "ABERTIS..IRS....BID.PRICE"

Here is another solution assuming text is the column name in the data frame read in: 这是另一种解决方案,假设文本是读入的数据框中的列名:

library(dplyr)
df$text <- gsub(("(ASK|BID)", "", df$text)
df %>% group_by(text) %>% filter(n() != 2)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 R 的小标题中找到包含“你”和“我”这两个词的歌曲的唯一名称? - How do I find unique names of songs containing both the words ‘You’ and ‘Me’ from a tibble in R? R 将列表中的字符值分配给 dataframe 列中的唯一组 - R assign character values from list to unique groups in column of dataframe 如果列名包含 R 中列表中的任何元素,如何用列表的匹配元素替换列名? - How to replace the column names with the matching element of the list if column name contains any element from the list in R? 如何将变量名称的字符向量转换为R中用逗号分隔的列表 - How to convert character vector of variable names to list separated by commas in R 将字符名称分配给R中的唯一数字标识符 - Assigning a character name to a unique numerical identifier in R r:如何根据从“as.character”到“as.Date”的特定列名一次格式化多个列,然后找到最近的日期? - r: How to format many columns at once based on specific column names from “as.character” to “as.Date” and subsequently find the most recent date? 如何从 R 中的字符向量中提取独特的表情符号? - How to pull unique emojis from a character vector in R? R,如何从字符设置行名属性为数字? - R, How to set row names attribute as numeric from character? 如何将 excel 文件命名为 R 列表中数据帧的名称 - How to name excel files as the names of the data frames in a list in R 如何找出R中列表中每个唯一元素的频率 - How to find out frequency of each unique elements in a list in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM