简体   繁体   English

使用R的电信流失生存分析

[英]Survival Analysis for Telecom Churn using R

I am working on Telecom Churn problem and here is my dataset. 我正在研究Telecom Churn问题,这是我的数据集。

http://www.sgi.com/tech/mlc/db/churn.data http://www.sgi.com/tech/mlc/db/churn.data

Names - http://www.sgi.com/tech/mlc/db/churn.names 名称 - http://www.sgi.com/tech/mlc/db/churn.names

I'm new to survival analysis.Given the training data,my idea to build a survival model to estimate the survival time along with predicting churn/non churn on test data based on the independent factors.Could anyone help me with the code or pointers on how to go about this problem. 我是生存分析的新手。给出了训练数据,我的想法是建立一个生存模型来估计生存时间,同时根据独立因素预测测试数据的流失/非流失。可以有人帮我编写代码或指针关于如何解决这个问题。

To be precise,say my train data has got 确切地说,我的火车数据已经得到了

customer call usage details,plan details,tenure of his account etc and whether did he churn or not. 客户电话使用细节,计划详情,他的帐户任期等,以及他是否流失。

Using general classification models,I can predict churn or not on test data.Now using Survival analysis,I want to predict the tenure of the survival in test data. 使用一般分类模型,我可以预测测试数据的流失与否。现在使用生存分析,我想预测测试数据中的生存期限。

Thanks, Maddy 谢谢,Maddy

If you're still interested (or for the benefit of those coming later), I've written a few guides specifically for conducting survival analysis on customer churn data using R. They cover a bunch of different analytical techniques, all with sample data and R code. 如果您仍然感兴趣(或者为了后来的那些人的利益),我已经编写了一些专门用于使用R对客户流失数据进行生存分析的指南。它们涵盖了一系列不同的分析技术,所有这些都包含样本数据和R代码。

Basic survival analysis: http://daynebatten.com/2015/02/customer-churn-survival-analysis/ 基本生存分析: http//daynebatten.com/2015/02/customer-churn-survival-analysis/

Basic cox regression: http://daynebatten.com/2015/02/customer-churn-cox-regression/ 基本的cox回归: http//daynebatten.com/2015/02/customer-churn-cox-regression/

Time-dependent covariates in cox regression: http://daynebatten.com/2015/12/survival-analysis-customer-churn-time-varying-covariates/ 考克斯回归中的时间依赖协变量: http//daynebatten.com/2015/12/survival-analysis-customer-churn-time-varying-covariates/

Time-dependent coefficients in cox regression: http://daynebatten.com/2016/01/customer-churn-time-dependent-coefficients/ cox回归中与时间相关的系数: http//daynebatten.com/2016/01/customer-churn-time-dependent-coefficients/

Restricted mean survival time (quantify the impact of churn in dollar terms): http://daynebatten.com/2015/03/customer-churn-restricted-mean-survival-time/ 受限制的平均生存时间(用美元计算流失的影响): http//daynebatten.com/2015/03/customer-churn-restricted-mean-survival-time/

Pseudo-observations (quantify dollar gain/loss associated with the churn effects of variables): http://daynebatten.com/2015/03/customer-churn-pseudo-observations/ 伪观察(量化与变量的流失效应相关的美元收益/损失): http//daynebatten.com/2015/03/customer-churn-pseudo-observations/

Please forgive the goofy images. 请原谅愚蠢的图像。

Here is some code to get you started: 以下是一些可以帮助您入门的代码:

First, read the data 首先,阅读数据

nm <- read.csv("http://www.sgi.com/tech/mlc/db/churn.names", 
               skip=4, colClasses=c("character", "NULL"), header=FALSE, sep=":")[[1]]
dat <- read.csv("http://www.sgi.com/tech/mlc/db/churn.data", header=FALSE, col.names=c(nm, "Churn"))

Use Surv() to set up a survival object for modeling 使用Surv()设置生存对象进行建模

library(survival)

s <- with(dat, Surv(account.length, as.numeric(Churn)))

Fit a cox proportional hazards model and plot the result 拟合考克斯比例风险模型并绘制结果

model <- coxph(s ~ total.day.charge + number.customer.service.calls, data=dat[, -4])
summary(model)
plot(survfit(model))

在此输入图像描述

Add a stratum: 添加一个层次:

model <- coxph(s ~ total.day.charge + strata(number.customer.service.calls <= 3), data=dat[, -4])
summary(model)
plot(survfit(model), col=c("blue", "red"))

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM