简体   繁体   English

顺序模式-数据挖掘

[英]Sequential Pattern - Data Mining

I am new to data mining, so I apologize if this question may be an obvious question to anyone. 我是数据挖掘的新手,所以对这个问题对于任何人来说都是显而易见的问题,我深表歉意。 I know there are quite a few data mining algorithms out there, such as sequential pattern mining, or the apriori algorithm. 我知道那里有很多数据挖掘算法,例如顺序模式挖掘或先验算法。 I would like to know if the following code I have implemented would be considered data mining, specifically for sequential patterns, if I have a database with approximately 20,000 students, or do I have to specifically use one of the existing data mining algorithms? 我想知道是否将我实现的以下代码视为数据挖掘,特别是针对顺序模式,如果我有一个大约有20,000名学生的数据库,还是我必须专门使用一种现有的数据挖掘算法?

String x = "SELECT STUDENTS.ROW, STUDENTS.MAJOR, STUDENTS.NAME " +
"CASE WHEN prior_row.NAME IS NOT NULL" +
"AND EXISTS(SELECT 'x' FROM STUDENTS prior_row " +
"WHERE STUDENTS.MAJOR = prior_row.MAJOR" +
"AND STUDENTS.ROW > prior_row.ROW + 1" +
"SELECT STUDENTS.MAJOR, STUDENTS.ROW, STUDENTS.NAME WHERE" +
"MAJOR < (SELECT MAJOR FROM STUDENTS WHERE MAJOR = 'MATH' 
"AND WHERE MAJOR > (SELECT MAJOR FROM STUDENTS WHERE MAJOR = 'SCIENCE' THEN 1 ELSE NULL          END Flagged_Values";

 st.executeQuery(x);

  String y = "SELECT STUDENTS.ROW, STUDENTS.MAJOR, STUDENTS.NAME" +
"CASE WHEN previous.NAME IS NOT NULL" +
"AND EXISTS(SELECT 'y' FROM STUDENTS previous" +
"WHERE STUDENTS.MAJOR = previous.MAJOR" +
"AND STUDENTS.ROW > previous.ROW + 1" +
"SELECT STUDENTS.MAJOR, STUDENTS.ROW, STUDENTS.NAME WHERE" +
"MAJOR < (SELECT THE_OUTCOME FROM STUDENTINFO WHERE MAJOR ='Math' +
"AND WHERE MAJOR > (SELECT MAJOR FROM STUDENTS WHERE MAJOR = 'SCIENCE'" +
"AND WHERE MAJOR > (SELECT MAJOR FROM STUDENTS WHERE MAJOR = 'Engineering'
"THEN 1 ELSE NULL END Flag ";

 st.executeQuery(y);

What you are doing are SQL select statements . 您正在执行的是SQL select语句 Projection, selection and aggregation. 投影,选择和聚合。

Have you read the Wikipedia article on data mining ? 您是否已阅读Wikipedia上有关数据挖掘的文章

The actual data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). 实际的数据挖掘任务是对大量数据进行自动或半自动分析,以提取以前未知的有趣模式,例如数据记录组(集群分析),异常记录(异常检测)和依存关系(关联规则挖掘)。 This usually involves using database techniques such as spatial indices. 这通常涉及使用数据库技术,例如空间索引。 These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics. 然后,这些模式可以看作是输入数据的一种摘要,并且可以用于进一步的分析中,或者例如用于机器学习和预测分析中。 For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. 例如,数据挖掘步骤可能会识别数据中的多个组,然后可以将这些组用于决策支持系统以获得更准确的预测结果。 Neither the data collection, data preparation, nor result interpretation and reporting are part of the data mining step, but do belong to the overall KDD process as additional steps. 数据收集,数据准备,结果解释和报告都不是数据挖掘步骤的一部分,但作为附加步骤,它确实属于整个KDD流程。

The term "data mining" is often misused for any kind of data collection or selection, but one should call these tasks "data collection" and "database query" instead of pulling up random buzzwords. 术语“数据挖掘”经常被误用于任何类型的数据收集或选择,但是应该将这些任务称为“数据收集”和“数据库查询”,而不是拉扯随机的流行词。 Data mining is the intersection of statistics, AI, machine learning, and databases . 数据挖掘是统计,人工智能,机器学习和数据库的交集 If these components are missing (and except for databases, I don't see them in your query), it should be called eg "databases", "machine learning" or "statistics". 如果缺少这些组件(除了数据库,在查询中看不到它们),则应将其称为“数据库”,“机器学习”或“统计信息”。

In general, and keep in mind, this is inherently opinion based, data mining refers to the process of taking data that is in a relatively unusable format and converting it into a format that is more usable. 通常,请记住,这本质上是基于意见的,数据挖掘是指获取相对不可用格式的数据并将其转换为更可用格式的过程。

For instance, if I have a huge .txt dump of unstructured text and I then extract relevant portions (according to some formal definition of relevant) and place it into a .bson store or something similar, that would be data mining, regardless of exactly how I do the extraction. 例如,如果我有一个很大的非结构化文本.txt转储,然后提取相关部分(根据相关的正式定义),然后将其放入.bson存储区或类似的存储区中,那将是数据挖掘,无论确切地我如何提取。

However, since your data is already in a SQL database, I wouldn't consider this data mining. 但是,由于您的数据已经在SQL数据库中,因此我不考虑这种数据挖掘。 I would consider it SQL development, though again, this is largely opinion-based. 我认为这是SQL开发,尽管再次,这很大程度上是基于意见的。 A SQL database is already a highly useful way of storing data, so accessing that data isn't introducing a level of functionality that wasn't already present. SQL数据库已经是一种非常有用的数据存储方式,因此访问该数据并不会引入尚不存在的功能级别。

tl;dr: I wouldn't say this counts as data mining, but it's a gray area. tl; dr:我不会说这算作数据挖掘,但这是一个灰色区域。

在数据挖掘领域,执行SQL查询将不被视为数据挖掘。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM