简体   繁体   中英

Text Mining - What is the best way to mine descriptive excel sheet data

I have university placement data pulled from databases in excel sheet. I need to text mine the job description offered by companies, which is a descriptive field for all the rows and then come up with the analysis of profiles in demand. Here is a snapshot of the data 在此处输入图片说明

Could anyone help me to kick start this activity?

Thanks Saurabh

I am not a data expert but I have some data mining experience. I would try following these steps for starters:

  1. Excel is not a good for such an analysis. Find some tool dedicated to data mining eg RStudio. R has many useful out-of-the-box algorithms for data mining.

  2. Cleanse the data eg all texts to lower case, remove stop words, remove punctuation, remove additional white spaces.

  3. Tokenize the data eg 1 word tokens - "finance", "bachelor"

  4. Decide on how you will assert if a certain profile is in demand or not? If by profile you mean that you need the information on the frequency of certain tokens appearing in the data more often then others eg "finance", "bachelor" etc. then simply create a frequency matrix. R allows you to create a visualisation of this - Word Clouds.

This is to start you off :). I am sure there is much more to be suggested in this matter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM