简体   繁体   English

如何通过机器学习预测员工任务 End_Date

[英]How to Predict Employee task End_Date through machine-learning

How to predict below and which algorithm is the best suit.如何预测下面以及哪种算法最适合。

Employee has Work Activity Start_Date & End_Date (Columns).员工有工作活动开始日期和结束日期(列)。 Sheet has few other columns such as Work_Complexity (High & Low), no.工作表几乎没有其他列,例如 Work_Complexity (High & Low),没有。 of sub-tasks for each activity.每个活动的子任务。

How to predict Work Activity End_Date for a Start_Date?如何预测 Start_Date 的 Work Activity End_Date? Which ML Algorithm has to be used?必须使用哪种 ML 算法?

Is this can be considered as a realistic use case?这可以被认为是一个现实的用例吗?

thanks!!!谢谢!!!

Yes, this is a realistic use case.是的,这是一个现实的用例。

If you have a labelled data means, you have a sheet where employee start date and end date is known for existing tasks and now you want to predict the end date for any new task, you can use Linear Regression with multiple variable.如果您有一个带标签的数据意味着,您有一个工作表,其中员工开始日期和结束日期是已知的现有任务,现在您想预测任何新任务的结束日期,您可以使用带多个变量的线性回归。 For more info related to Linear Regression with multiple variable, go through this link: https://www.investopedia.com/terms/m/mlr.asp有关具有多个变量的线性回归的更多信息,请通过此链接 go: https://www.investopedia.com/terms/m/mlr.asp

Anyway, don't get much confused in that theory.无论如何,不要对那个理论感到困惑。 In simple terms, Linear Regression is an approach to modelling a relationship between the variables (columns).简单来说,线性回归是一种对变量(列)之间的关系进行建模的方法。 Linear Regression with one variable means, you are trying to predict the end date with only using one variable(column) ie start date in your case.具有一个变量的线性回归意味着,您试图仅使用一个变量(列)来预测结束日期,即在您的情况下是开始日期。 If you want to predict the end date with using more than one variable(columns) ie start date, complexity of task, sub-task etc;如果您想使用多个变量(列)来预测结束日期,即开始日期、任务的复杂性、子任务等; you have to use Linear Regression with multiple variable.您必须使用具有多个变量的线性回归。 I am using House Price Prediction model.我正在使用房价预测 model。

Below is the Implementation of Linear Regression with one variable using python, where we will predict the house price using only one variable:下面是使用 python 的一个变量的线性回归的实现,我们将只使用一个变量来预测房价:

import pandas as pd  #used for uploading your datasets #you have to import machine learning libraries
import numpy as np   #for array
from sklearn import linear_model  #for prediction

df = pd.read_csv('/content/MLPractical2 - Sheet1.csv')  #you need to upload your file
df

Output: File which I have uploaded, contains following data Output:我上传的文件,包含以下数据

Area ||面积 || Price价格

2600 || 2600 || 555000 555000

3000 || 3000 || 565000 565000

3200 || 3200 || 610000 610000

3600 || 3600 || 680000 680000

4000 || 4000 || 725000 725000

Let's make a prediction of house price which is having area 3601:让我们预测一下面积为 3601 的房价:

reg = linear_model.LinearRegression()
reg.fit(df[['Area']], df.Price)
reg.predict([[3601]])

Output: array([669653.42465753]) Output:数组([669653.42465753])

We are predicting price on basis of only one variable(column) ie Area我们仅根据一个变量(列)预测价格,即面积

As you can observe in file which i have uploaded, Price of House having area 3600 is 680000 and price which our algorithm is predicting for area 3601 is 669653.42465753 which is very close.正如您在我上传的文件中看到的那样,面积为 3600 的房屋价格为 680000,我们的算法预测的面积为 3601 的价格为 669653.42465753,非常接近。

Let's look at the implementation of Linear Regression with multiple variable using python;让我们看看使用 python 的多变量线性回归的实现; where we'll use multiple variable to predict our house price我们将使用多个变量来预测我们的房价

import pandas as pd                  #same as above
import numpy as np
from sklearn import linear_model
df = pd.read_csv('/content/ML_Sheet_2.csv')
df

Output: File which I have uploaded in this case contains following data Output:我在这种情况下上传的文件包含以下数据

Area ||面积 || Bedroooms ||卧室 || Age ||年龄 || Price价格

2600 || 2600 || 3.0 || 3.0 || 20 || 20 || 550000 550000

3000 || 3000 || 4.0 || 4.0 || 15 || 15 || 565000 565000

3200 ||3.0 ||18 || 3200 ||3.0 ||18 || 610000 610000

3600 || 3600 || 3.0 || 3.0 || 30 || 30 || 595000 595000

4000 || 4000 || 5.0 || 5.0 || 8 || 8 || 760000 760000

Let's make a prediction of house price which is having area 3500, 3 bedrooms and 10 years old让我们来预测一下3500面积3房10年的房价

reg = linear_model.LinearRegression()
reg.fit(df[['Area', 'Bedroooms', 'Age']], df.Price)
reg.predict([[3500, 3, 10]])

Output: array([717775]) Output:数组([717775])

We are predicting the house price on the basis of three variable ie Area, Number od bedrooms and Age of House.我们根据三个变量来预测房价,即面积、卧室数量和房屋年龄。

As you can observe in the file which I have uploaded, Price of House having area 3200, 3 bedrooms and 18 years old is 610000 and price which our algorithm is predicting for area 3500(more than 3200), 3 bedrooms and 10 years old is 717775 which is very close and understandable because we are predicting for house which is having more area than 3200 and less age(New house has more price) than 18.正如您在我上传的文件中看到的那样,面积为 3200、3 间卧室和 18 年的房屋价格为 610000,我们的算法预测的面积为 3500(超过 3200)、3 间卧室和 10 年的价格是717775 非常接近且可以理解,因为我们预测的房屋面积大于 3200 且年龄小于 18 岁(新房价格更高)。

Similarly, you can also prepare a excel sheet of your existing data and save it in.csv format and proceed further as I did.同样,您也可以准备现有数据的 excel 表并将其保存为.csv 格式,然后像我一样继续进行。 I am using google colab for writing my code;我正在使用 google colab 编写代码; I prefer you to use the same:我更喜欢你使用相同的:

https://colab.research.google.com/notebooks/intro.ipynb#recent=true https://colab.research.google.com/notebooks/intro.ipynb#recent=true

Hope this helps you!希望这对你有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM