用二进制值替换CSV文件中的分类值

Question

I have a clinical data set and I have to replace 我有临床数据集，必须更换

the 1st column values 'DECEASED' with 1, if the value 'Date' > 365 else replace with 0 (zero), 如果值'Date'> 365，则第一列的值'DECEASED'为1，否则用0（零）替换，
the value 'LIVING' with one if 'Day_to_follow_up' > 365 如果'Day_to_follow_up'> 365，则值为'LIVING'且值为1

In addition I need to assign the ages 另外我需要指定年龄

0-25 to bin 0, 0-25到bin 0，
25-50 to bin 1, 25-50到垃圾箱1
50-75 to bin 2 50-75到垃圾箱2
above 75 to bin 4. 75以上到bin 4。

Here is my code. 这是我的代码。

import csv
import pandas as pd
with open('combined_file', 'rb') as f,open('newFile', 'wb') as out:
    reader = csv.reader(f)


    writer = csv.writer(out)
    for row in reader:
        #print "AABB"
        if 'DECEASED' in row[1]:
            if row[10]>365:
                row[1]=1
                writer.writerow(row)
            elif row[10]<365:
                row[1]=0
                writer.writerow(row)
        if 'LIVING' in row[1]:
            if row[11]>365:
                row[1]=1
                writer.writerow(row)

sample input 样本输入

sample id , status , age ,gender ,date ,days_to_last_followup
0     ,    Deceased , 42 , M  ,   326 ,    149
1     ,    Deceased , 56 , F  ,   500 ,    30
2     ,    living   , 43 ,M   ,   25  ,    150

sample output 样本输出

sample id , status , age ,gender,date ,days_to_last_followup
0     ,       0    , 1 ,  M    ,326 ,    149
1     ,       1    , 2 , F     ,500 ,    30
2     ,       0    , 1 ,M   ,   25  ,    150

Answer 1

I'm not sure what your question is, based off this post. 根据这篇文章，我不确定您的问题是什么。 Either way, the logical structure would have an issue if both 'Deceased' and 'Living' were in row[1]. 无论哪种方式，如果“已故”和“活着”都在行中，则逻辑结构会出现问题[1]。 I'd suggest you create some test cases to look for bad data, since ETL processes routinely have to deal with unexpected data formats/fields. 我建议您创建一些测试用例以查找不良数据，因为ETL流程通常必须处理意外的数据格式/字段。

I'm also not sure why you are importing the pandas library. 我也不确定为什么要导入熊猫库。 You don't seem to be calling it anywhere in the code you posted. 您似乎在所发布的代码中的任何地方都没有调用它。

Answer 2

Your code is a good starting point - a few things that the code does not cover: 您的代码是一个很好的起点-代码未涵盖的几件事：

What happens when 'DECEASED' and 'LIVING' are both in row[1] ? 当行中的“减少”和“生活”都出现row[1]什么？ Your code will write two rows. 您的代码将写两行。 To fix this, set the if 'LIVING' to elif 'LIVING' . 要解决此问题，请将if 'LIVING'为elif 'LIVING' 。
You need an else case to catch what happens when neither DECEASED or LIVING is in row[1] . 您需要一个else案例来捕捉当row[1]没有DECEASED或LIVING时发生的情况。

用二进制值替换CSV文件中的分类值

问题描述

2 个解决方案

解决方案1
0 2016-10-31 13:48:52

解决方案2
0 2016-10-31 13:51:02

用二进制值替换CSV文件中的分类值

问题描述

2 个解决方案

解决方案1 0 2016-10-31 13:48:52

解决方案2 0 2016-10-31 13:51:02

解决方案1
0 2016-10-31 13:48:52

解决方案2
0 2016-10-31 13:51:02