简体   繁体   中英

Find the Nth instance of a character in Excel (no VBA)

TL;DR summary: I want a formula that will find the Nth " _ " (for any N) in a string, and return its index; OR to find the Nth substring, separated by " _ ". I have VBA to do this, but it's slow.

Long version: I am working with advertising campaign data. My marketers (fortunately) use a consistent naming scheme for their campaigns. Unfortunately, it's very long.

The campaign names contain exactly 1 piece of data that I cannot otherwise get from reports.

For reference, campaign names are of the format:

ADV_CO_BG_Product_UniqueID_XX_mm.dd.yyyy_mm.dd.yyyy_TYP_NUM

... and I have a column of about 200K of them (growing by a couple hundred each week).

Edit:
The important part is that there are multiple parts of the campaign name, with _ as a delimiter between them. In this case, I want the 9th part, but i want an option that is flexible enough that I don't have to add or remove lines to change which part I target.

I've seen on other questions to use a nested formula like:

=MID(
  Data_OLV[@Campaign],
  FIND("_",Data_OLV[@Campaign],
    FIND("_",Data_OLV[@Campaign],
      FIND("_",Data_OLV[@Campaign],
        FIND("_",Data_OLV[@Campaign],
          FIND("_",Data_OLV[@Campaign],
            FIND("_",Data_OLV[@Campaign],
              FIND("_",Data_OLV[@Campaign],
                FIND("_",Data_OLV[@Campaign])+1)
              +1)
            +1)
          +1)
        +1)
      +1)
    +1)
  +1,
3)

... but that is hard to modify if I need something in a different position.

I have a UDF called StringSplit (see below) that provides the desired results, but it's extremely slow (and only works if you enable macros, which not all of my audience does).

Is there a better way to do what I'm trying to do?

    Public Function StringSplit(input_ As String, delimiter_ As String, index_ As Integer)
        On Error GoTo err

        out = Split(input_, delimiter_, -1, vbTextCompare)
        StringSplit = out(index_ - 1)
        Exit Function
    err:
        If err.Number = 9 Then
            StringSplit = CVErr(xlErrRef)
            Exit Function
        End If
        StringSplit = err.Description
    End Function

I think this is the formula you are looking for -

=MID(A2, FIND(CHAR(1), SUBSTITUTE(A2, B2, CHAR(1), C2))+1, FIND(CHAR(1), SUBSTITUTE(A2, B2, CHAR(1), C2+1)) -  FIND(CHAR(1), SUBSTITUTE(A2, B2, CHAR(1), C2))-1)

This is how to do it -

在此处输入图片说明

Here B2 is the Delimiter type and C2 is the Nth occurrence of the Delimiter . you can modify the code as per your need. Just change the B2 & C2 .

If, for example, you want to locate the third instance of ? in cell A1 , try:

=FIND(CHAR(1),SUBSTITUTE(A1,"?",CHAR(1),3))

在此处输入图片说明

NOTE:

We assume that CHAR(1) does not appear in the original string.
To get the last instance, use:

=FIND(CHAR(1),SUBSTITUTE(A1,"?",CHAR(1),(LEN(A1)-LEN(SUBSTITUTE(A1,"?","")))))

You're saying, if I am correct, that the data you receive is always in format you posted and that you consistently want to extract the TYP data.

Why not search for TYP in the string, and additionally search for NUM as that indicates the following subdata?

Then, you would end up with a formula such as

=TRIM(MID(W20,SEARCH("TYP",W20),SEARCH("NUM",W20)-SEARCH("TYP",W20)))

In this formula, cell W20 holds the entire data-string. Naturally you can edit this range or instead paste the whole string in its place.

EDIT

Since OP mentioned the title strings are not consistent:

=TRIM(MID(W20,SEARCH(A1,W20),IF(A2="",LEN(W20),SEARCH(A2,W20)-SEARCH(A1,W20))))

In cell A1 would be the title string of the data that has to be extracted, in this case being TYP

In cell A2 would be the title string of the next subdata. If empty, the formula returns all characters found from the first SEARCH function using cell A1 .

As Egan Wolf commented, there is a solution at http://exceljet.net/formula/find-nth-occurrence-of-character =MID([@[Campaign]],FIND(CHAR(160),SUBSTITUTE([@[Campaign]],"_",CHAR(160),9))+1,4)

Or, more generally: =MID(TextToSearch,FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber ))+1,LengthOfDesiredSection)

LengthOfDesiredSection can, of course, by found with a subsection of the first formula, like so (line breaks added for clarity):

  =MID(TextToSearch,
   FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber))+1,
   IFERROR(
  (FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber+1)-
   FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber)))-1,
   LEN(TextToSearch)-
   FIND(CHAR(160),SUBSTITUTE(TextToSearch,Delimiter,CHAR(160),InstanceNumber))))

The IFERROR() protects against situations where the Delimiter only appears InstanceNumber times in the TextToSearch .

One way to find the nth instance of an underscore delimited string, and return that sub-string , is with this formula:

=TRIM(MID(SUBSTITUTE(A1,"_",REPT(" ",999)),MAX(1,999*(n-1)),999))

where n is the instance you are looking for.

But, of course, this requires that the elements are present in the same order, and are always present (or replaced by an underscore if they are not).

If you are using a version of Excel with the FILTERXML function, you can use this formula:

=INDEX(FILTERXML("<t><s>" & SUBSTITUTE(A1,"_","</s><s>") & "</s></t>","//s"),n)

Not sure which one would be more efficient (faster) on a large database

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM