用.tolist（）生成的熊猫str.split产生了一个浮点数？

Question

I have a hard time bug fixing my code which worked fine in testing on a small subset of the entire data. 我遇到了一个很难修复的错误，该错误代码可以在对整个数据的一小部分进行测试时很好地工作。 I could double check types to be sure, but the error message is already informative enough: The list I made ended up being a float. 我可以确定类型是否经过仔细检查，但是错误消息已经足够翔实了：我制作的列表最终是浮点数。 But how? 但是如何？

The last three lines which ran: 运行的最后三行：

diagnoses = all_treatments['DIAGNOS'].str.split(' ').tolist()
all_treatments = all_treatments.drop(['DIAGNOS','INDATUMA','date'], axis=1)
all_treatments['tobacco'] = tobacco(diagnoses)

The error: 错误：

Traceback (most recent call last):
 File "treatments2_noiopro.py", line 97, in <module>
   all_treatments['tobacco'] = tobacco(diagnoses)
 File "treatments2_noiopro.py", line 13, in tobacco
   for codes in codes_column]
TypeError: 'float' object is not iterable

FWIW, the function itself is: FWIW，函数本身是：

def tobacco(codes_column):
    return [any('C30' <= code < 'C40' or 
                'F17' <= code <'F18'
                for code in codes) if codes else False
            for codes in codes_column]

I am using versions pandas 0.16.2 np19py26_0, iopro 1.7.1 np19py27_p0, and python 2.7.10 0 under Linux. 我在Linux下使用的版本是pandas 0.16.2 np19py26_0，iopro 1.7.1 np19py27_p0和python 2.7.10 0。

Answer 1

You can use str.split on the series and apply a function to the result: 您可以在序列上使用str.split并将一个函数应用于结果：

def tobacco(codes):
    return any(['C30' <= code < 'C40' or 'F17' <= code <'F18' for code in codes])

data = [('C35 C50'), ('C36'), ('C37'), ('C50 C51'), ('F1 F2'), ('F17'), ('F3 F17'), ('')]
df = pd.DataFrame(data=data, columns=['DIAGNOS'])

df

    DIAGNOS
0   C35 C50
1   C36
2   C37
3   C50 C51
4   F1 F2
5   F17
6   F3 F17
7

df.DIAGNOS.str.split(' ').apply(tobacco)

0     True
1     True
2     True
3    False
4    False
5     True
6     True
7    False
dtype: bool

edit: 编辑：

Seems like using str.contains is significantly faster than both methods. 似乎使用str.contains明显比这两种方法都快。

tobacco_codes = '|'.join(["C{}".format(i) for i in range(30, 40)] + ["F17"])

data = [('C35 C50'), ('C36'), ('C37'), ('C50 C51'), ('F1 F2'), ('F17'), ('F3 F17'), ('C3')]
df = pd.DataFrame(data=data, columns=['DIAGNOS'])

df.DIAGNOS.str.contains(tobacco_codes)

Answer 2

I guess diagnoses is a generator and since you drop something in line 2 of your code this changes the generator. 我猜诊断是一个生成器，由于您在代码的第2行中放了一些东西，因此更改了生成器。 I can't test anything right now, but let me know if it works when commenting line 2 of your code. 我目前无法测试任何内容，但是在注释您的代码的第2行时让我知道它是否有效。

用.tolist（）生成的熊猫str.split产生了一个浮点数？

问题描述

2 个解决方案

解决方案1
1 2015-07-29 19:26:45

解决方案2
-1 2015-07-29 19:19:23

用.tolist（）生成的熊猫str.split产生了一个浮点数？

问题描述

2 个解决方案

解决方案1 1 2015-07-29 19:26:45

解决方案2 -1 2015-07-29 19:19:23

解决方案1
1 2015-07-29 19:26:45

解决方案2
-1 2015-07-29 19:19:23