简体   繁体   中英

How should I conduct concordance analysis between categorical and continuous variables?

I am currently having difficulty choosing a statistical test to validate concordance between two measures using two different measurement styles. Below are how my variables are structured. I will use a fake example of my data of to help demonstrate my problem.

Measure one: 1 nominal variable with 8 categories - Primary car selection, eg, What is your primary choice of car make. Responses, eg, 1 = Ford, 2 = Holden, 3 = Toyota, 4 = Mistubishi, 5 = Mazda, 6 = Hyundai, 7 = Subaru, 8 = Volkswagen. The participant chose one category in this instance as their primary rating. Measure two: 8 continuous variables taking the 8 categories from Measure one. Eg, Please rate the likelihood that you would purchase a____ 1) Ford. The participant rated their endorsement of the item on a 1 (Not at all) to 5 (Extremely likely) scale across all 8 variables.

My hypothesis predicts that these two measurement styles will agree with each other. Ie, If someone selects a Ford as their primary choice of car, they will also endorse purchasing a Ford as extremely likely, more than other car makes.

What statistical tests should I consider for this concordance analysis? So far I have considered using a weighted Cohen's kappa but do not quite consider that this fits my example.

Cheers,

Jacob.

Ps. Excuse my car selection, I am from Australia and selected the most common car makes in my area

There is a lot you could do with such data in principle.

One thing is not clear to me from your discussion. Do you have those data from the same people? Thus, do you know "person A would by Ford, and he has the following preferences for all makes?" or are these two data sets independent, thus you only know "x% of people would buy Ford, and the overall preferences for cars are....". The latter is much less interesting and I believe only the former is relevant to discuss. For the latter case, Cohen kappa is maybe the best thing you can do.

But if you have all the information for each person:

Even in such a relatively simple data there are a lot of aspects. You cannot reduce it to one number without loosing most of it. I would start by making a table, or 2D plot, on the x-axis the rating 1...5 for one of the makes (eg Subaru) and on the y-axis the probability for the 8 different makes. I would find it interesting to see: which cars are the top choices for people rating "make A" with just 1 and compare this to people rating "make B" with just 1. And how strong does this change if you do the same for ratings 5?

One particularly interesting outcome of this study would also be the probability for people rating "make A" with 5 to actually choose "make A". And to compare this between all the makes. There may be differences between the makes, eg the buyers of some makes may be more driven by "reason" and others by "fashion". I believe "fashion" leads to much higher correlation (thus higher probability) compared to "reason"...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM