[英]How to find unique name/character from a list of names in R
I have a huge list of company names. 我有很多公司名称。 As illustrated below, if name company is ABBEYCREST.DEAD...10.10.14...ASK.PRICE, this means ABBEYCREST.DEAD...10.10.14... is name of company and ASK.PRICE is ASK Price data and when it ends with BID.PRICE is means its the BID PRICE data. 如下图所示,如果名称公司为ABBEYCREST.DEAD ... 10.10.14 ... ASK.PRICE,则表示ABBEYCREST.DEAD ... 10.10.14 ...为公司名称,而ASK.PRICE为ASK价格数据当以BID.PRICE结尾时,表示其BID PRICE数据。 I want to identify the company whose only one column name is avaiable in the dataframe. 我想识别在数据框中只有一个列名称可用的公司。 Actually I have a dataframe which has colum headers as illustrated below, implying each company should have 2 columns, if there are 4000 companies so there should be 8000 columns in my dataframe but I have 7999 ( although my dataframe has a date column but I exclude it when I count columns). 其实我有一个数据框,具有如下所示的列标题,这意味着每个公司应该有2列,如果有4000家公司,那么我的数据框中应该有8000列,但是我有7999个(尽管我的数据框有一个日期列,但我排除了当我计算列时)。
df<-AskBid
ABBEYCREST.DEAD...10.10.14...ASK.PRICE
ABBEYCREST.DEAD...10.10.14...BID.PRICE
ABBOT.GROUP.DEAD...07.03.08...ASK.PRICE
ABBOT.GROUP.DEAD...07.03.08...BID.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...BID.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...BID.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...ASK.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...BID.PRICE
ABERTIS..IRS....BID.PRICE
ABGENIX..IRS..DEAD...12.11.07...ASK.PRICE
ABGENIX..IRS..DEAD...12.11.07...BID.PRICE
ABLON.GROUP.DEAD...31.05.13...ASK.PRICE
ABLON.GROUP.DEAD...31.05.13...BID.PRICE
ACAMBIS.DEAD...25.09.08...ASK.PRICE
ACAMBIS.DEAD...25.09.08...BID.PRICE
I want to find is 我想找到的是
missing <- df
ABERTIS..IRS....BID.PRICE
I would really appreciate your help. 我将衷心感谢您的帮助。 This is causing problems in my estimations. 这在我的估计中引起问题。
You can remove the ASK.PRICE
and BID.PRICE
part and call duplicated
twice (the second time on the reversed order): 您可以删除ASK.PRICE
和BID.PRICE
部分,并duplicated
调用两次(按相反的顺序第二次调用):
cn <- readLines(textConnection(
"ABBEYCREST.DEAD...10.10.14...ASK.PRICE
ABBEYCREST.DEAD...10.10.14...BID.PRICE
ABBOT.GROUP.DEAD...07.03.08...ASK.PRICE
ABBOT.GROUP.DEAD...07.03.08...BID.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...BID.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...BID.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...ASK.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...BID.PRICE
ABERTIS..IRS....BID.PRICE
ABGENIX..IRS..DEAD...12.11.07...ASK.PRICE
ABGENIX..IRS..DEAD...12.11.07...BID.PRICE
ABLON.GROUP.DEAD...31.05.13...ASK.PRICE
ABLON.GROUP.DEAD...31.05.13...BID.PRICE
ACAMBIS.DEAD...25.09.08...ASK.PRICE
ACAMBIS.DEAD...25.09.08...BID.PRICE"))
## remove (ASK|BID).PRICE
cn.sub <- gsub("(ASK|BID)\\.PRICE$", "", cn)
cn[!(duplicated(cn.sub) | rev(duplicated(rev(cn.sub))))]
# [1] "ABERTIS..IRS....BID.PRICE"
Here is another solution assuming text is the column name in the data frame read in: 这是另一种解决方案,假设文本是读入的数据框中的列名:
library(dplyr)
df$text <- gsub(("(ASK|BID)", "", df$text)
df %>% group_by(text) %>% filter(n() != 2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.