简体   繁体   中英

Correlations/Data Mining in Microsoft Excel 2003

I have an Excel spreadsheet where each column is a certain variable. At the end of my columns I have a special last column called "Type" which can be A, B, C, or D.

Each row is a data point with different variables that ends up in a certain "Type" bucket (A/B/C/D) recorded in the last column.

I need a way to examine all entries of a certain type (say, "C" or "C"|"D") and find out which of the variable(s) is a good predictor of this last column, and which are better predictors than others.

Some variables are numbers, others are fixed strings (from a set of strings), so it's not just a number/number correlation.

Is Excel 2003 a good tool for that, or are there better statistical programs that make this easier? Do I create a Pivot/Histogram for each category, or is there a better way to run these queries? Thanks

You can make some filtering, especially to clean the data (I mean, to change the data values into one type, string or numeral) using microsoft excel. Execl also makes some data mining. However, for the kind of problems you have, a good tool that I recommend you is WEKA. Using this tool, you can make associative classification prediction (ie, class association rule mining)of all data instances(rows) and therefore, you can determine which items fall belong to A/B/C/D. Your special attribute will be your class attribute.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM