简体   繁体   中英

Index multiple columns and Match distinct values, returning list of unique values across columns

I've searched far and wide for a solution to my problem... over several long weeks now. I've come up a partially working solutions, which I'll include at the bottom for those who might know how to modify/extend them to resolve the problem.

Here's what I'm trying to accomplish (the following descriptions are in reference to this screen capture http://imgur.com/oTkbjrw ):

在此处输入图片说明

1) I am looking to match values from Column G with Column A ($G2 with $A$2:$A$10).

2) If the values match, then extract unique values (excluding blanks) from adjacent Columns B, C, and D, and return them across columns (H through M).

So, as shown in the 'desired results 1' table (G1:M10), if G2 (Vancouver) is found in A2:A10, then extract indexed values from columns B,C, and D where Vancouver is a match (ie, Blue, ,Green,Green,Red,Yellow,Teal,Green, , , ,Pink,Pink,Pink, ,) and return only the unique values (excluding blanks) across columns H through L (blue,Green,Red,Yellow,Teal,Pink).

I've been looking for a formula solution that can be popped into H2 and then dragged horizontally to M2, to reveal the results. I'm not married to this solution, however. An alternative that I've also considered is to return the unique values in comma delimited format rather than across columns(see 'desired Results 2'). If this is easier to accomplish, I'm all for it.

Note: I will be running this formula on 10,000+ rows. A lean/efficient solution is desirable if possible.

PARTIAL SOLUTIONS I'VE PUT TOGETHER:

1) {Array formula 1}, results can be seen here :

=IFERROR(INDEX($B$2:$B$10, SMALL(IF(COUNTIF($G2,$A$2:$A$10), MATCH(ROW($B$2:$B$10), ROW($B$2:$B$10)), ""), COLUMN(A1))),"")

This formula is only able to index column B, but it successfully matches and returns values across columns. Unfortunately, it doesn't extract unique values and returns blank cells.

2) Re: Justin's comment about a likely VBA solution, figure I should pop up one of the VBA solutions I came across.

Function UNIQUE_PH(Lookupvalue As String, LookupRange As Range, ColumnNumber As Integer)
Dim i As Long
Dim Result As String
For i = 1 To LookupRange.Columns(1).Cells.Count
  If LookupRange.Cells(i, 1) = Lookupvalue Then
    For J = 1 To i - 1
    If LookupRange.Cells(J, 1) = Lookupvalue Then
      If LookupRange.Cells(J, ColumnNumber) = LookupRange.Cells(i, ColumnNumber) Then
        GoTo Skip
      End If
    End If
    Next J
    Result = Result & " " & LookupRange.Cells(i, ColumnNumber) & ","
Skip:
  End If
Next i
UNIQUE_PH = Left(Result, Len(Result) - 1)
End Function

'It takes 3 arguments as inputs:

'1. Lookupvalue – A string that we need to look-up in a range of cells.
'2. LookupRange  – An array of cells from where we need to fetch the data 
'3. ColumnNumber – It is the column number of the table/array from which matching value is to be returned (e.g. 2 for second column).

This basically does what the above formula accomplishes, with the exception that it successfully identifies unique values. It presents the results in one cell as comma delimited (similar to 'desired Results 2'). This isn't able to search through multiple columns and doesn't remove blank cells. I also tried to run it on 10,000 rows of data and it was very slow. I know very little about VBA, so I'm not sure what could be contributing to the lag.

Sample Data solution

'in G2
=A2
'in H2 as an array formula with CSE
=IFERROR(INDEX($B$2:$B$16, MATCH(0, IF($A$2:$A$16=$G2, IF(SIGN(LEN($B$2:$B$16)), COUNTIF($G2:G2, $B$2:$B$16), 1), 1), 0), 1),
 IFERROR(INDEX($C$2:$C$16, MATCH(0, IF($A$2:$A$16=$G2, IF(SIGN(LEN($C$2:$C$16)), COUNTIF($G2:G2, $C$2:$C$16), 1), 1), 0), 1),
 IFERROR(INDEX($D$2:$D$16, MATCH(0, IF($A$2:$A$16=$G2, IF(SIGN(LEN($D$2:$D$16)), COUNTIF($G2:G2, $D$2:$D$16), 1), 1), 0), 1),
 TEXT(,))))
'in G19
=G2
'in H19
=TEXTJOIN(",", TRUE, H2:N2)

Fill H2 right then G2:N2 down as appropriate. Fill G19:H19 down to collate the values above. See footnote ¹ if you get a #NAME! error on the TEXTJOIN function .

在此处输入图片说明

10K Rows of Data solution

Array formulas chew up calculations exponentially the referenced ranges increase in size. The only way to do this as an array formula is to only reference the rows with Vancouver when you are creating a unique list for Vancouver and only reference the rows with Seattle when you are creating a unique list for Seattle . In other words, don't reference rows 2:10000 for Vancouver when Vancouver's data is in rows 2:6.

Note: you will have to sort your data with column A as the primary key. This operation requires it. Unsorted data (even grouped) will not allow the second MATCH function to locate the terminating row.

The rows of data in column B containing Vancouver in column A can be referenced with this.

INDEX(B:B, MATCH("vancouver", A:A, 0)):INDEX(B:B, MATCH("vancouver", A:A))

Now all you have to do is replace all occurrences of $B$2:$B$16 with the above. Adjust the formula and make the replacements for $C$2:$C$16 and $D$2:$D$16 as well. The bonus is that you can get rid of the check to see IF($A$2:$A$16=$G2, ... since by definition, you are only referencing rows with the appropriate city in column A. Make sure you reference G2 and do not hardcode in Vancouver .

'in H2 as an array formula with CSE
=IFERROR(INDEX(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A))), 1), 0)),
 IFERROR(INDEX(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A))), 1), 0)),
 IFERROR(INDEX(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A))), 1), 0)),
 TEXT(,))))

You should be able to live with the calculation time on that array formula even through 10K rows.

在此处输入图片说明

I'm pretty sure that this is as far as native worksheet functions can go. Further improvements would be using variant memory arrays.

Addendum²

There is one further optimization. A worksheet IF only processes the part of the formula that is true. If you look at the names of the cities and only process the array portion of the formula when the city names change, copying directly from above when they are the same you should be able to limit the calculations further.

'in H2 as an array formula with CSE
=IF($G2=$G1, H1, 
    IFERROR(INDEX(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A))), 1), 0)),
    IFERROR(INDEX(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A))), 1), 0)),
    IFERROR(INDEX(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A))), 1), 0)),
    TEXT(,)))))

Addendum³

There was a problem with single city entries and this amendment takes care of that.

You may receive a Circular Reference warning with this. Technically, the warning is true but you wil never actually get a circular reference due to the nested IF structure. In other words, the circular reference is only valid when an IF is false. Excel reports this because it does not examine the conditions; only the possibility of a circular reference which technically is true but can never actually occur.

'in H2 as an array formula with CSE
=IF($G2=$G1, H1, IF(COUNTIF($A:$A, $G2)=1,
    IFERROR(INDEX(INDEX($B:$D, MATCH($G2, $A:$A, 0), 0), MATCH(0, IF(INDEX($B:$D, MATCH($G2, $A:$A, 0), 0)<>"", COUNTIF($G2:G2, INDEX($B:$D, MATCH($G2, $A:$A, 0), 0)), 1), 0)), TEXT(,)),
    IFERROR(INDEX(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($B:$B, MATCH($G2, $A:$A, 0)):INDEX($B:$B, MATCH($G2, $A:$A))), 1), 0)),
    IFERROR(INDEX(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($C:$C, MATCH($G2, $A:$A, 0)):INDEX($C:$C, MATCH($G2, $A:$A))), 1), 0)),
    IFERROR(INDEX(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)), MATCH(0, IF(SIGN(LEN(INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A)))), COUNTIF($G2:G2, INDEX($D:$D, MATCH($G2, $A:$A, 0)):INDEX($D:$D, MATCH($G2, $A:$A))), 1), 0)),
    TEXT(,))))))

¹ If your Excel version does not support the TEXTJOIN function , search this site for [excel][textjoin] to find alternatives in ragged-length, delimited string concatenation.

Try this UDF

Function UniquePh(Lookupvalue As String, LookupRange As Range, ValueRng As Range) As String
Dim dict As Object
Dim lkpArr() As Variant
Dim ValArr() As Variant
Set LookupRange = Intersect(LookupRange, LookupRange.Parent.UsedRange)
Set ValueRng = Intersect(ValueRng, ValueRng.Parent.UsedRange)
If LookupRange.Rows.Count <> ValueRng.Rows.Count Or LookupRange.Columns.Count > 1 Then Exit Function

Set dict = CreateObject("Scripting.Dictionary")
lkpArr = LookupRange.Value
ValArr = ValueRng.Value

For i = LBound(lkpArr, 1) To UBound(lkpArr, 1)
    If lkpArr(i, 1) = Lookupvalue Then
        For j = LBound(ValArr, 2) To UBound(ValArr, 2)
            If ValArr(i, j) <> "" Then
                On Error Resume Next
                    dict.Add ValArr(i, j), ValArr(i, j)
                On Error GoTo 0
            End If
        Next j
    End If
Next i

For Each itm In dict
    UniquePh = UniquePh & itm & ", "
Next itm
If Len(UniquePh) > 0 Then
    UniquePh = Left(UniquePh, Len(UniquePh) - 2)
Else: UniquePh = ""
End If

End Function

Put it in a modules attached to the workbook. Then call it from the sheet. Put this formula in H2:

=UniquePh(G2,$A$2:$A$10,$B$2:$D$10)

在此处输入图片说明

It should run a lot faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM