简体   繁体   English

我学习数据挖掘的路径是否正确

[英]Is my path of learning data mining correct

Someone has just told my boss what data mining can do to a company like recommendation , predictive modelling.有人刚刚告诉我的老板数据挖掘可以为公司带来什么,比如推荐、预测建模。 Basically we are a website company.基本上我们是一家网站公司。 I am going on leave for 6 months.我要休假6个月。 So my boss said that I can learn some DM techniques so that when I come back we can visit small shops or small companies to provide them with predictive data using data mining algos.所以我的老板说我可以学习一些 DM 技术,这样当我回来时我们可以访问小商店或小公司,使用数据挖掘算法为他们提供预测数据。

The shops will be only having sql files or csv files for customers or more.商店将只为客户或更多客户提供 sql 文件或 csv 文件。

Now I only know MYSQL and have no idea what data mining is and whether it works like I am thinking above, I mean, is it possible that if someone has database of customers, shopping and I can apply data mining technique .现在我只知道MYSQL,不知道什么是数据挖掘,不知道它是否像我上面想的那样工作,我的意思是,如果有人有客户、购物和我的数据库,我可以应用数据挖掘技术。 I mean我是说

(raw mysql or sql data) or (csv files) ----data mining--> (some useful result)
  • 1) Is the above system correct or am I wrong 1)上述系统是正确的还是我错了
  • 2) Will the shops or business would like to have that or am I missing something 2) 商店或企业会想要那个还是我错过了什么

My PLAN of learning those is in following order.我的学习计划如下。 I am thinking of first getting some sql server 2008 cert because in my area most are using microsoft so may be I need to know sql我想首先获得一些 sql server 2008 证书,因为在我所在的地区大多数都在使用 microsoft,所以我可能需要了解 sql

1)MCTS: SQL Server 2008, Implementation and Maintenance
2)MCTS: SQL Server 2008, Database Development
3)MCTS: SQL Server 2008, Business Intelligence Development and Maintenance

(or should I go for oracle and oracle data warehousiong ... I want to first do some databse properly) (或者我应该去oracle和oracle data warehousiong ...我想先正确地做一些databse)

4)Data Mining with Microsoft SQL Server 2008 (2009)     
5)Python for dummies    
6)Programming Collective Intelligence: Building Smart Web 2.0 Applications

Is my flow correct or can I achieve my result a better way.我的流程是否正确,或者我能否以更好的方式实现我的结果。 The reason I am doing cert is to get some understanding for sql and in case I don't get that job after 6 months I can get into new job related to data mining or BI or at least sql server.我做证书的原因是为了对 sql 有一些了解,如果我在 6 个月后没有得到那份工作,我可以从事与数据挖掘或 BI 或至少 sql server 相关的新工作。

Please help me请帮我

Ok this is not a simple YES / NO answer. 好吧,这不是一个简单的是/否答案。 You are doing some thing right. 你正在做一些正确的事。 This way you will know the SQL Server Data Mining tool set. 这样您就可以了解SQL Server数据挖掘工具集。 And you will undertstand which algo to use where. 你将在哪里使用哪个算法。 (How will Naive Byes . Different from Decision Tree..etc ) (Naive Byes如何。与决策树不同..等等)

Once you know this stuff , second thing is getting to know you data and how to make the FLAT tables that will serve as input. 一旦你知道这些东西,第二件事就是了解你的数据,以及如何制作将作为输入的FLAT表。 This is most important because this is the data you will use to train you modles. 这是最重要的,因为这是您将用于训练模型的数据。 You dont need to know the internal mathematics behind ANN algorithm and so on. 你不需要知道ANN算法背后的内部数学等等。 You should just know how to use it. 你应该知道如何使用它。 There are data mining add-ins for excel (2007 onwards) which you can use to play around . excel(2007年以后)有数据挖掘加载项,您可以使用它们来玩。

There are some data mining videos on http://channel9.msdn.com by Rafal Luckawiski. Rafal Luckawiski在http://channel9.msdn.com上有一些数据挖掘视频。 They are good for giving some idea on how to begin. 它们有助于对如何开始有所了解。

After this it is a matter of practice and the more you play with new data and make new models and analyze results the better you are going to become. 在此之后,这是一个实践问题,您使用新数据和制作新模型并分析结果越多,您将变得越好。

Let me know if you need more info on PPTs, Samples etc 如果您需要有关PPT,样品等的更多信息,请告诉我

Uh, to do data-mining effectively, you need to know a lot of math. 呃,为了有效地进行数据挖掘,你需要了解大量的数学知识。 Your path is like "i want to be a surgeon, so I'll learn how to cut with a scalpel". 你的道路就像“我想成为一名外科医生,所以我将学习如何用手术刀切割”。 Yes, knowing some SQL and is probably necessary (just depends on how your data is organized), but FAR from sufficient. 是的,了解一些SQL并且可能是必要的(仅取决于数据的组织方式),但FAR足够了。

Seems like you are doing it all wrong. 好像你做错了。

The most important thing is to learn data mining, AI and predictive science topics, all those hardcore math and CS stuffs. 最重要的是学习数据挖掘,人工智能和预测科学主题,所有那些硬核数学和CS的东西。 Not database technology which is although important, but not very related to data mining fields. 不是数据库技术虽然重要,但与数据挖掘领域无关。

There is one book I would recommend, and I think it is tailored to your needs: Programming Collective Intelligence . 我会推荐一本书,我认为它是根据您的需求量身定制的: 编程集体智慧

替代文字

From what you have written it close to data mining but not data scraping. 从你所写的内容来看,它接近于数据挖掘而不是数据抓取。

First of all, the answer by Ngu Soon Hui is diverting you in a completely wrong direction. 首先, Ngu Soon Hui回答正在转移你一个完全错误的方向。
What he advised you is called data scarping but not data mining . 他建议你的是数据疤痕而不是数据挖掘
You'd better understand the differences between data mining vs. data scraping (aka website/web scraping aka screen scraping aka data harvesting): 您将更好地理解数据挖掘与数据抓取之间的差异(也称为网站/网络抓取,也就是屏幕抓取,即数据收集):

"(raw mysql or sql data) or (csv files) ----data mining--> (some useful result)" “(原始mysql或sql数据)或(csv文件)----数据挖掘 - >(一些有用的结果)”

Just forget completely about MySql and do not loose your time on it because there is absolutely no support for datamining in MySql. 完全忘记MySql并且不要浪费你的时间,因为在MySql中绝对不支持数据挖掘。 Only for data scraping. 仅用于数据抓取。 Though you might have the interest in the latter. 虽然你可能对后者感兴趣。 You'f better know the difference 你最好知道区别

"1)MCTS: SQL Server 2008, Implementation and Maintenance 2)MCTS: SQL Server 2008, Database Development 3)MCTS: SQL Server 2008, Business Intelligence Development and Maintenance" “1)MCTS:SQL Server 2008,实施和维护2)MCTS:SQL Server 2008,数据库开发3)MCTS:SQL Server 2008,商业智能开发和维护”

Why do you need 1) and 2)? 你为什么需要1)和2)? Even 3) contains only 20% of datamining. 甚至3)只包含20%的数据挖掘。

5)Python for dummies 6)Programming Collective Intelligence: Building Smart Web 2.0 Applications 5)Python for dummies 6)编程集体智慧:构建Smart Web 2.0应用程序

Why do you need Python? 你为什么需要Python?

6) is not datamining. 6)不是数据挖掘。 It is called data scraping and it is again the path in completely wrong direction from DM 它被称为数据抓取,它再次成为DM完全错误方向的路径

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM