简体   繁体   English

在Excel中查找字符的第N个实例(无VBA)

[英]Find the Nth instance of a character in Excel (no VBA)

TL;DR summary: I want a formula that will find the Nth " _ " (for any N) in a string, and return its index; TL; DR摘要:我想要一个公式,该公式将在字符串中找到第N个“ _ ”(对于任何N),并返回其索引; OR to find the Nth substring, separated by " _ ". 或找到第N个子字符串,用“ _ ”分隔。 I have VBA to do this, but it's slow. 我有VBA来执行此操作,但是速度很慢。

Long version: I am working with advertising campaign data. 加长版:我正在处理广告活动数据。 My marketers (fortunately) use a consistent naming scheme for their campaigns. 我的营销人员(很幸运)在广告系列中使用了一致的命名方案。 Unfortunately, it's very long. 不幸的是,这很长。

The campaign names contain exactly 1 piece of data that I cannot otherwise get from reports. 广告系列名称恰好包含我无法通过报告获得的1条数据。

For reference, campaign names are of the format: 供参考,广告系列名称的格式为:

ADV_CO_BG_Product_UniqueID_XX_mm.dd.yyyy_mm.dd.yyyy_TYP_NUM

... and I have a column of about 200K of them (growing by a couple hundred each week). ...而我有一列约200K(每周增长几百)。

Edit: 编辑:
The important part is that there are multiple parts of the campaign name, with _ as a delimiter between them. 重要的是,广告系列名称包含多个部分,其中_作为分隔符。 In this case, I want the 9th part, but i want an option that is flexible enough that I don't have to add or remove lines to change which part I target. 在这种情况下,我需要第9部分,但我需要一个足够灵活的选项,而不必添加或删除行来更改我定位的部分。

I've seen on other questions to use a nested formula like: 我在其他问题上也看到过使用嵌套公式,例如:

=MID(
  Data_OLV[@Campaign],
  FIND("_",Data_OLV[@Campaign],
    FIND("_",Data_OLV[@Campaign],
      FIND("_",Data_OLV[@Campaign],
        FIND("_",Data_OLV[@Campaign],
          FIND("_",Data_OLV[@Campaign],
            FIND("_",Data_OLV[@Campaign],
              FIND("_",Data_OLV[@Campaign],
                FIND("_",Data_OLV[@Campaign])+1)
              +1)
            +1)
          +1)
        +1)
      +1)
    +1)
  +1,
3)

... but that is hard to modify if I need something in a different position. ...但是如果我需要其他位置的东西很难修改。

I have a UDF called StringSplit (see below) that provides the desired results, but it's extremely slow (and only works if you enable macros, which not all of my audience does). 我有一个称为StringSplit的UDF(请参见下文),它提供了所需的结果,但是它非常慢(并且只有在启用了宏的情况下才起作用,但并非我的所有听众都可以这样做)。

Is there a better way to do what I'm trying to do? 有没有更好的方法来做我想做的事情?

    Public Function StringSplit(input_ As String, delimiter_ As String, index_ As Integer)
        On Error GoTo err

        out = Split(input_, delimiter_, -1, vbTextCompare)
        StringSplit = out(index_ - 1)
        Exit Function
    err:
        If err.Number = 9 Then
            StringSplit = CVErr(xlErrRef)
            Exit Function
        End If
        StringSplit = err.Description
    End Function

I think this is the formula you are looking for - 我认为这是您要寻找的公式-

=MID(A2, FIND(CHAR(1), SUBSTITUTE(A2, B2, CHAR(1), C2))+1, FIND(CHAR(1), SUBSTITUTE(A2, B2, CHAR(1), C2+1)) -  FIND(CHAR(1), SUBSTITUTE(A2, B2, CHAR(1), C2))-1)

This is how to do it - 这是怎么做的-

在此处输入图片说明

Here B2 is the Delimiter type and C2 is the Nth occurrence of the Delimiter . 这里B2Delimiter typeC2Delimiter typeNth occurrence of the Delimiter you can modify the code as per your need. 您可以根据需要修改代码。 Just change the B2 & C2 . 只需更改B2C2

If, for example, you want to locate the third instance of ? 例如,如果您要查找第三个实例 in cell A1 , try: 在单元格A1中 ,尝试:

=FIND(CHAR(1),SUBSTITUTE(A1,"?",CHAR(1),3))

在此处输入图片说明

NOTE: 注意:

We assume that CHAR(1) does not appear in the original string. 我们假设CHAR(1)没有出现在原始字符串中。
To get the last instance, use: 要获取最后一个实例,请使用:

=FIND(CHAR(1),SUBSTITUTE(A1,"?",CHAR(1),(LEN(A1)-LEN(SUBSTITUTE(A1,"?","")))))

You're saying, if I am correct, that the data you receive is always in format you posted and that you consistently want to extract the TYP data. 您是说,如果我是对的,那么您收到的数据始终采用您发布的格式,并且您始终希望提取TYP数据。

Why not search for TYP in the string, and additionally search for NUM as that indicates the following subdata? 为什么不在字符串中搜索TYP ,并另外搜索NUM因为它表示以下子数据?

Then, you would end up with a formula such as 然后,您将得到一个公式,例如

=TRIM(MID(W20,SEARCH("TYP",W20),SEARCH("NUM",W20)-SEARCH("TYP",W20)))

In this formula, cell W20 holds the entire data-string. 在此公式中,单元格W20保存整个数据字符串。 Naturally you can edit this range or instead paste the whole string in its place. 当然,您可以编辑此范围,也可以将整个字符串粘贴到其位置。

EDIT 编辑

Since OP mentioned the title strings are not consistent: 由于OP提到标题字符串不一致:

=TRIM(MID(W20,SEARCH(A1,W20),IF(A2="",LEN(W20),SEARCH(A2,W20)-SEARCH(A1,W20))))

In cell A1 would be the title string of the data that has to be extracted, in this case being TYP 在单元格A1中将是必须提取的数据的标题字符串,在这种情况下为TYP

In cell A2 would be the title string of the next subdata. 在单元格A2中将是下一个子数据的标题字符串。 If empty, the formula returns all characters found from the first SEARCH function using cell A1 . 如果为空,该公式将使用单元格A1返回从第一个SEARCH函数找到的所有字符。

As Egan Wolf commented, there is a solution at http://exceljet.net/formula/find-nth-occurrence-of-character =MID([@[Campaign]],FIND(CHAR(160),SUBSTITUTE([@[Campaign]],"_",CHAR(160),9))+1,4) 正如Egan Wolf所说, http://exceljet.net/formula/find-nth-occurrence-of-character =MID([@[Campaign]],FIND(CHAR(160),SUBSTITUTE([@[Campaign]],"_",CHAR(160),9))+1,4)

Or, more generally: =MID(TextToSearch,FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber ))+1,LengthOfDesiredSection) 或更一般而言: =MID(TextToSearch,FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber ))+1,LengthOfDesiredSection)

LengthOfDesiredSection can, of course, by found with a subsection of the first formula, like so (line breaks added for clarity): 当然,可以通过第一个公式的一个子节找到LengthOfDesiredSection ,如下所示(为清楚起见添加了换行符):

  =MID(TextToSearch,
   FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber))+1,
   IFERROR(
  (FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber+1)-
   FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber)))-1,
   LEN(TextToSearch)-
   FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber))))

The IFERROR() protects against situations where the Delimiter only appears InstanceNumber times in the TextToSearch . IFERROR()可防止DelimiterTextToSearch仅出现InstanceNumber次的TextToSearch

One way to find the nth instance of an underscore delimited string, and return that sub-string , is with this formula: 查找下划线定界字符串的第n个实例并返回该子字符串的一种方法是使用以下公式:

=TRIM(MID(SUBSTITUTE(A1,"_",REPT(" ",999)),MAX(1,999*(n-1)),999))

where n is the instance you are looking for. 其中n是您要查找的实例。

But, of course, this requires that the elements are present in the same order, and are always present (or replaced by an underscore if they are not). 但是,当然,这要求元素以相同的顺序出现,并且始终存在(或者如果不存在则用下划线代替)。

If you are using a version of Excel with the FILTERXML function, you can use this formula: 如果您使用的是带有FILTERXML函数的Excel版本,则可以使用以下公式:

=INDEX(FILTERXML("<t><s>" & SUBSTITUTE(A1,"_","</s><s>") & "</s></t>","//s"),n)

Not sure which one would be more efficient (faster) on a large database 不知道在大型数据库上哪一个效率更高(更快)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM