简体   繁体   中英

Find values occurring in multiple columns in excel

I have sets of gene probes that are upregulated when put under different chemical stresses. Each column contains all of the upregulated gene probes. I have 12 columns, how do I get a list of gene probes that appear in all 12 columns?

I've been able to find similarities between two columns using the formula

 =IF(ISERROR(MATCH(A2,$C$2:$C$21473,0)),"",A2)

but cant work out how to adapt it to include 12 columns

G.Ac  G.As  G.At  G.Ac.At  G.As.Ac  G.As.At G.Cd  G.Cu  G.Ni    
G.Cd.Cu  G.Cd.Ni  G.Ni.Cu               

GENE:JGI_V11_3346220103 GENE:JGI_V11_2653050203 GENE:JGI_V11_3299790103 
GENE:JGI_V11_359040103  GENE:JGI_V11_2228010103 GENE:JGI_V11_2662750203 
GENE:JGI_V11_1926920303 GENE:JGI_V11_3134270303 GENE:JGI_V11_3119540303 
GENE:JGI_V11_3134270203 GENE:JGI_V11_1926920303 GENE:JGI_V11_3134270303             

GENE:JGI_V11_3164760203 GENE:JGI_V11_565470303  GENE:JGI_V11_2296170203 
GENE:JGI_V11_2045300203 GENE:JGI_V11_2421620203 GENE:JGI_V11_2228010303 
GENE:JGI_V11_2196580303 GENE:JGI_V11_3134270203 GENE:JGI_V11_3119540203 
GENE:JGI_V11_1926920103 GENE:JGI_V11_1926920103 GENE:JGI_V11_1014720202             

GENE:JGI_V11_478830203  GENE:JGI_V11_3168730303 GENE:JGI_V11_3311070202 
GENE:JGI_V11_3216620102 GENE:JGI_V11_2653050303 GENE:JGI_V11_3300140202 
GENE:JGI_V11_2653050303 GENE:JGI_V11_1159220202 GENE:JGI_V11_2024180303 
GENE:JGI_V11_1926920303 GENE:JGI_V11_2196580303 GENE:JGI_V11_1159220202             

GENE:JGI_V11_3164760303 GENE:JGI_V11_2228010203 GENE:JGI_V11_2341670203 
GENE:JGI_V11_1938910303 GENE:JGI_V11_3026230203 GENE:JGI_V11_2449230203 
GENE:JGI_V11_3134270303 GENE:JGI_V11_2235750203 GENE:JGI_V11_1981410203 
GENE:JGI_V11_3251310202 GENE:JGI_V11_977750103  GENE:JGI_V11_954070203              

GENE:JGI_V11_2267320203 GENE:JGI_V11_2268000303 GENE:JGI_V11_2226270101 
GENE:JGI_V11_3003640303 GENE:JGI_V11_223520203  GENE:JGI_V11_2662750103 
GENE:JGI_V11_2228010103 GENE:JGI_V11_3251310202 GENE:JGI_V11_3198630203 
GENE:JGI_V11_3134270303 GENE:JGI_V11_1926920203 GENE:JGI_V11_287750103              

GENE:JGI_V11_465160203  GENE:JGI_V11_2268000203 GENE:JGI_V11_2473230303 
GENE:JGI_V11_3192220102 GENE:JGI_V11_3026230303 GENE:JGI_V11_3039310303 
GENE:JGI_V11_1926920103 GENE:JGI_V11_1159220102 GENE:JGI_V11_3052790202 
GENE:JGI_V11_3075830303 GENE:JGI_V11_2196580203 GENE:JGI_V11_3134280203             

GENE:JGI_V11_3142970303 GENE:JGI_V11_503720303  GENE:JGI_V11_2236410103 
GENE:JGI_V11_3042230103 GENE:JGI_V11_2228010203 GENE:JGI_V11_3028210101 
GENE:JGI_V11_2105710303 GENE:JGI_V11_1926920303 GENE:JGI_V11_2131620103 
GENE:JGI_V11_1002840203 GENE:JGI_V11_2088480203 GENE:JGI_V11_3196120102             

Heres the first 8 rows of the 12 columns. There are 21473 rows in total.

Thanks

You could use an array formula like this to count how many columns a particular gene probe occurs in

=SUM(--(MMULT(TRANSPOSE(ROW(A$2:L$10000)^0),N(A$2:L$10000=A2))>0))

This is a standard way of getting column totals for a 2D array - in this case an array of true/false values corresponding to instances of an array element being equal/unequal to A2.

It is rather a brute force approach - it needs ~120K multiplications for each row. If you copy the formula down for ~10K rows, there is a delay of ~100 seconds on my computer while Excel works out the results.

Must be entered as an array formula using Ctrl Shift Enter

在此处输入图片说明

In this dummy data C is the only value that occurs in all 12 columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM