类型错误：获取数据帧长度时，“float”类型的对象没有 len()

Question

test.csv 是这样的：

device_id,upload_time
12345678901,2020-06-01 07:40:20+00:00
123456,2020-06-01 07:40:40+00:00
123456,2020-06-01 07:41:00+00:00
123456,2020-06-01 07:41:02+00:00
123456,2020-06-01 07:41:04+00:00
123456,2020-06-01 07:41:08+00:00
12345678901,2020-06-01 07:41:10+00:00
12345678901,2020-06-01 07:41:18+00:00
12345678901,2020-06-01 07:41:20+00:00
,2020-06-01 07:41:24+00:00
,2020-06-01 07:41:40+00:00
12345678901,2020-06-01 07:42:00+00:00
12345678901,2020-06-01 07:42:20+00:00
12345678901,2020-06-01 07:42:22+00:00
12345678901,2020-06-01 07:42:24+00:00
12345678901,2020-06-01 07:42:26+00:00
12345678901,2020-06-01 07:42:28+00:00
12345678901,2020-06-01 07:42:40+00:00
1234,2020-06-01 07:43:00+00:00
1234,2020-06-01 07:43:12+00:00

数据框：

您可以将 deviceid 转换为int或str ，没问题。 我使用此代码来获取新的数据框。

import pandas as pd

df = pd.read_csv(r'test.csv', encoding='utf-8', parse_dates=[1])
df = df[pd.notnull(df['device_id'])] #Delete rows where device_id is null.
a = df[df['device_id'].map(len)!=11] #Get data whose device_id length is not 11.
b = df[df['device_id'].map(len)==11] #Get data whose device_id length is 11.

但错误信息是：

类型错误：“float”类型的对象没有 len()

哪里错了？

Answer 1

下面的代码会帮助你

将浮点值转换为字符串将有助于了解位数。

import pandas as pd
df = pd.read_csv(r'test.csv', encoding='utf-8', parse_dates=[1])

# to remove the null(nan)
df = df.dropna()
or 
df = df[df['device_id'].isnull()==False]
or
df = df[df['device_id'].isna()==False]

a = df[df['device_id'].astype(str).map(len)!=11]
b = df[df['device_id'].astype(str).map(len)==11]

另一种方法

a = df[df['device_id'].astype(str).str.len()!=11]
b = df[df['device_id'].astype(str).str.len()==11]

另一种方法

a = df[df['device_id'].astype(str).apply(len)!=11]
b = df[df['device_id'].astype(str).apply(len)==11]

Answer 2

对于您指定的输入文件，尽管所有值都是int类型，但出于某种原因， device_id列似乎被视为float数据类型。 由于此原因，您在尝试计算长度时将面临一个问题：

例子：

len('12345') 
#will give you len = 5, which is the correct length

然而，

len('12345.0') 
#will give you len = 7, which is wrong since it considers the decimal point too

因此，最好将您的数据类型转换为int ，然后对int列的str版本执行长度检查，如下所示：

参考：

len 参数可以是序列（字符串、元组或列表）或映射（字典）。 https://docs.python.org/2/library/functions.html#len
在调用 len 函数之前，您应该验证参数是否是这种类型之一。 您可以调用方法 isinstance() 来验证它。 看看如何使用它。 https://docs.python.org/2/library/functions.html#isinstance

所以试试这个，

import pandas as pd

df = pd.read_csv(r'sample.csv', parse_dates=[1])
df = df[pd.notnull(df['device_id'])] #Delete rows where device_id is null.

#Convert to int
df['device_id'] = df['device_id'].astype(float).astype(int)

#len function cannot be computed on an int column directly. You should convert to str and then compute len
a = df[df['device_id'].astype(str).map(len)!=11]
b = df[df['device_id'].astype(str).map(len)==11]

类型错误：获取数据帧长度时，“float”类型的对象没有 len()

问题描述

2 个解决方案

解决方案1
0 2020-10-19 06:34:43

解决方案2
0 已采纳 2020-10-19 06:53:55

类型错误：获取数据帧长度时，“float”类型的对象没有 len()

问题描述

2 个解决方案

解决方案1 0 2020-10-19 06:34:43

解决方案2 0 已采纳 2020-10-19 06:53:55

解决方案1
0 2020-10-19 06:34:43

解决方案2
0 已采纳 2020-10-19 06:53:55