简体   繁体   中英

How to Predict Employee task End_Date through machine-learning

How to predict below and which algorithm is the best suit.

Employee has Work Activity Start_Date & End_Date (Columns). Sheet has few other columns such as Work_Complexity (High & Low), no. of sub-tasks for each activity.

How to predict Work Activity End_Date for a Start_Date? Which ML Algorithm has to be used?

Is this can be considered as a realistic use case?

thanks!!!

Yes, this is a realistic use case.

If you have a labelled data means, you have a sheet where employee start date and end date is known for existing tasks and now you want to predict the end date for any new task, you can use Linear Regression with multiple variable. For more info related to Linear Regression with multiple variable, go through this link: https://www.investopedia.com/terms/m/mlr.asp

Anyway, don't get much confused in that theory. In simple terms, Linear Regression is an approach to modelling a relationship between the variables (columns). Linear Regression with one variable means, you are trying to predict the end date with only using one variable(column) ie start date in your case. If you want to predict the end date with using more than one variable(columns) ie start date, complexity of task, sub-task etc; you have to use Linear Regression with multiple variable. I am using House Price Prediction model.

Below is the Implementation of Linear Regression with one variable using python, where we will predict the house price using only one variable:

import pandas as pd  #used for uploading your datasets #you have to import machine learning libraries
import numpy as np   #for array
from sklearn import linear_model  #for prediction

df = pd.read_csv('/content/MLPractical2 - Sheet1.csv')  #you need to upload your file
df

Output: File which I have uploaded, contains following data

Area || Price

2600 || 555000

3000 || 565000

3200 || 610000

3600 || 680000

4000 || 725000

Let's make a prediction of house price which is having area 3601:

reg = linear_model.LinearRegression()
reg.fit(df[['Area']], df.Price)
reg.predict([[3601]])

Output: array([669653.42465753])

We are predicting price on basis of only one variable(column) ie Area

As you can observe in file which i have uploaded, Price of House having area 3600 is 680000 and price which our algorithm is predicting for area 3601 is 669653.42465753 which is very close.

Let's look at the implementation of Linear Regression with multiple variable using python; where we'll use multiple variable to predict our house price

import pandas as pd                  #same as above
import numpy as np
from sklearn import linear_model
df = pd.read_csv('/content/ML_Sheet_2.csv')
df

Output: File which I have uploaded in this case contains following data

Area || Bedroooms || Age || Price

2600 || 3.0 || 20 || 550000

3000 || 4.0 || 15 || 565000

3200 ||3.0 ||18 || 610000

3600 || 3.0 || 30 || 595000

4000 || 5.0 || 8 || 760000

Let's make a prediction of house price which is having area 3500, 3 bedrooms and 10 years old

reg = linear_model.LinearRegression()
reg.fit(df[['Area', 'Bedroooms', 'Age']], df.Price)
reg.predict([[3500, 3, 10]])

Output: array([717775])

We are predicting the house price on the basis of three variable ie Area, Number od bedrooms and Age of House.

As you can observe in the file which I have uploaded, Price of House having area 3200, 3 bedrooms and 18 years old is 610000 and price which our algorithm is predicting for area 3500(more than 3200), 3 bedrooms and 10 years old is 717775 which is very close and understandable because we are predicting for house which is having more area than 3200 and less age(New house has more price) than 18.

Similarly, you can also prepare a excel sheet of your existing data and save it in.csv format and proceed further as I did. I am using google colab for writing my code; I prefer you to use the same:

https://colab.research.google.com/notebooks/intro.ipynb#recent=true

Hope this helps you!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM