简体繁体中英

Most appropriate analysis method - Clustering?

原文 2018-03-15 09:25:44 0 1 merge/ statistics/ cluster-analysis

I have 2 large data frames with similar variables representing 2 separate surveys. Some rows (participants) in each data frame correspond to the other and I would like to link these two together.

There is an index in both dataframes though this index indicates locality of the survey (ie region) and not individual IDs. Merging is not possible as in most cases there is an identical index values for different participants.

Given that merging on an index value from the 2 data frames is not possible, I wish to compare similar variables (binary) from both data frames that (in addition to the index values common to both data frame) in order to give me a highest likelihood of a match. I can then (with some margin of error) match rows with the most similar values for similar variables and merge them together.

What do you think would be the appropriate method for doing this? Clustering?

Best, James

1 answers

That obviously is not clustering. You don't want large groups of records.

What you want to do is an approximate JOIN.

Clustering in Matlab

How to merge clustering results for different clustering approaches?

How to clustering based on the distance in Python pandas?

GitLab Plugin SonarQube MR Analysis

R, Analysis on observations that do not merge

how can I implement adaptive mergesort on hadoop clustering using mapreduce

How to merge unsupervised hierarchical clustering result with the original data

Static Code Analysis for possible merge conflicts in GIT

Merging Data Frames in R - Text Analysis

Struggling to build a merged data frame to analysis on

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Clustering in Matlab How to merge clustering results for different clustering approaches? How to clustering based on the distance in Python pandas? GitLab Plugin SonarQube MR Analysis R, Analysis on observations that do not merge how can I implement adaptive mergesort on hadoop clustering using mapreduce How to merge unsupervised hierarchical clustering result with the original data Static Code Analysis for possible merge conflicts in GIT Merging Data Frames in R - Text Analysis Struggling to build a merged data frame to analysis on

Related Tags

Most appropriate analysis method - Clustering?

Question

1 answers

solution1 0 2018-03-16 07:18:50

solution1
0 2018-03-16 07:18:50