简体   繁体   English

AttributeError:制作生成器对象时,“ float”对象没有属性“ split”

[英]AttributeError: 'float' object has no attribute 'split' when making a generator object

print([x["keywords"].split(",") for i,x in df.iterrows()  if not isinstance(x["keywords"], (int, float))])

print([x["tags"].split(",") for i,x in df.iterrows()  if not isinstance(x["tags"], (int, float))])

print([x["rating"].split(",") for i,x in df.iterrows()  if not isinstance(x["rating"], (int, float))])

print([x["rank"].split(",") for i,x in df.iterrows()  if not isinstance(x["rank"], (int, float))])

I want to join these four statements in a single statement when i concatenate them it gives me error: 当我将它们连接在一起时,我想将这四个语句合并为一个语句,这给了我错误:

AttributeError: 'float' object has no attribute 'split' AttributeError:“ float”对象没有属性“ split”

features = [(x["entity_id"], x["tags"].split(","),x["rating"],
           x["rank"],x["keywords"].split(",") )
           for (index, x) in df.iterrows() if not isinstance(x, (int, float))]

pd.DataFrame.iterrows returns tuples of index and pd.Series objects. pd.DataFrame.iterrows返回索引和pd.Series对象的元组。 Hence isinstance(x, (int, float)) isn't doing what you want it to, as a pd.Series object isn't a subclass of int or float . 因此,由于pd.Series对象不是intfloat的子类,所以isinstance(x, (int, float))并没有做您想要的事情。 With this method you would need to iterate individual values contained within the pd.Series object. 使用此方法,您将需要迭代pd.Series对象中包含的各个值。

This is possible, but I strongly advise against it. 这是可能的,但我强烈建议不要这样做。 In fact, I advise you avoid iterrows altogether, as this loses all vectorised functionality, which is one of the main benefits of Pandas. 实际上,我建议您完全避免iterrows ,因为它会丢失所有矢量化功能,这是Pandas的主要优点之一。

Here is a solution using pd.DataFrame.mask and NumPy arrays: 这是使用pd.DataFrame.mask和NumPy数组的解决方案:

df = pd.DataFrame({'entity_id': ['SomeId', 3124123, 'SomeOtherId', 314324],
                   'tags': ['Tag1,Tag2', None, 'Tag4', 'Tag5,Tag6,Tag7'],
                   'rating': [5.0, 'SomeRating', 'SomeOtherRating', np.nan],
                   'rank': ['SomeRank', 2, np.nan, 4],
                   'keywords': ['key1', 'key2,key3', 'key4', 'key5']})

df2 = df.mask(df.apply(pd.to_numeric, errors='coerce').notnull() | df.isnull(), None)

for col in ['tags', 'keywords']:
    df2[col] = df2[col].str.split(',')

col_order = ['entity_id', 'tags', 'rating', 'rank', 'keywords']
res = [list(filter(None, x)) for x in df2[col_order].values.tolist()]

Result 结果

print(res)

[['SomeId', ['Tag1', 'Tag2'], 'SomeRank', ['key1']],
 ['SomeRating', ['key2', 'key3']],
 ['SomeOtherId', ['Tag4'], 'SomeOtherRating', ['key4']],
 [['Tag5', 'Tag6', 'Tag7'], ['key5']]]

As a comment, this is pretty messy. 作为评论,这很混乱。 It's good practice to decide on a consistent structure rather than this kind of mixed data type structure and filtering based on type. 优良作法是确定一致的结构,而不是这种混合的数据类型结构和基于类型的过滤。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM