[英]Replace categorical values in CSV file with binary values
I have a clinical data set and I have to replace 我有临床数据集,必须更换
In addition I need to assign the ages 另外我需要指定年龄
Here is my code. 这是我的代码。
import csv
import pandas as pd
with open('combined_file', 'rb') as f,open('newFile', 'wb') as out:
reader = csv.reader(f)
writer = csv.writer(out)
for row in reader:
#print "AABB"
if 'DECEASED' in row[1]:
if row[10]>365:
row[1]=1
writer.writerow(row)
elif row[10]<365:
row[1]=0
writer.writerow(row)
if 'LIVING' in row[1]:
if row[11]>365:
row[1]=1
writer.writerow(row)
sample input 样本输入
sample id , status , age ,gender ,date ,days_to_last_followup
0 , Deceased , 42 , M , 326 , 149
1 , Deceased , 56 , F , 500 , 30
2 , living , 43 ,M , 25 , 150
sample output 样本输出
sample id , status , age ,gender,date ,days_to_last_followup
0 , 0 , 1 , M ,326 , 149
1 , 1 , 2 , F ,500 , 30
2 , 0 , 1 ,M , 25 , 150
I'm not sure what your question is, based off this post. 根据这篇文章,我不确定您的问题是什么。 Either way, the logical structure would have an issue if both 'Deceased' and 'Living' were in row[1]. 无论哪种方式,如果“已故”和“活着”都在行中,则逻辑结构会出现问题[1]。 I'd suggest you create some test cases to look for bad data, since ETL processes routinely have to deal with unexpected data formats/fields. 我建议您创建一些测试用例以查找不良数据,因为ETL流程通常必须处理意外的数据格式/字段。
I'm also not sure why you are importing the pandas library. 我也不确定为什么要导入熊猫库。 You don't seem to be calling it anywhere in the code you posted. 您似乎在所发布的代码中的任何地方都没有调用它。
Your code is a good starting point - a few things that the code does not cover: 您的代码是一个很好的起点-代码未涵盖的几件事:
row[1]
? 当行中的“减少”和“生活”都出现row[1]
什么? Your code will write two rows. 您的代码将写两行。 To fix this, set the if 'LIVING'
to elif 'LIVING'
. 要解决此问题,请将if 'LIVING'
为elif 'LIVING'
。 else
case to catch what happens when neither DECEASED
or LIVING
is in row[1]
. 您需要一个else
案例来捕捉当row[1]
没有DECEASED
或LIVING
时发生的情况。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.