简体   繁体   English

列表的熊猫列:如何设置项目的dtype

[英]Pandas column of list: How to set the dtype of items

I have a dataframe which has multiple columns containing lists and the length of the lists in each row are different: 我有一个数据框,其中有多个包含列表的列,并且每一行中列表的长度不同:

tweetid tweet_date    user_mentions       hashtags
00112   11-02-2014    []                  []
00113   11-02-2014    [00113]             [obama, trump]
00114   30-07-2015    [00114, 00115]      [hillary, trump, sanders]
00115   30-07-2015    []                  []

The dataframe is a concat of three different dataframes and I'm not sure whether the items in the lists are of the same dtype. 该数据框是三个不同数据框的组合,我不确定列表中的项目是否具有相同的dtype。 For example, in the user_mentions column, sometime the data is like: 例如,在user_mentions列中,有时数据如下:

[00114, 00115]

But sometimes is like this: 但是有时候是这样的:

['00114','00115'] 

How can I set the dtype for the items in the lists? 如何为列表中的项目设置dtype?

Pandas DataFrames are not really designed to house lists as row/column values, so this is why you are facing difficulty. Pandas DataFrames并非真正旨在将列表作为行/列值来容纳,因此这就是您面临困难的原因。 you could do 你可以做

python3.x: python3.x:

df['user_mentions'].apply(lambda x: list(map(int, x)))

python2.x: python2.x:

df['user_mentions'].apply(lambda x: map(int, x))

In python3 when mapping a map object is returned so you have to convert to list, in python2 this does not happen so you don't explicitly call it a list. 在python3中,当返回映射对象时,您必须转换为列表,而在python2中,这不会发生,因此您无需显式地将其称为列表。

In the above lambda, x is your row list and you are mapping the values to int . 在上面的lambda中,x是您的行list并且您正在将值映射到int

df['user_mentions'].map(lambda x: ['00' + str(y) if isinstance(y,int) else y for y in x]) If your objective is to convert all user_mentions to str the above might help. df['user_mentions'].map(lambda x: ['00' + str(y) if isinstance(y,int) else y for y in x])如果您的目标是将所有user_mentions转换为str ,则可能会有所帮助。 I would also look into this post for unnesting . 我还要考虑这个职位unnesting As mentioned ; 如上所述 ; pandas not really designed to house lists as values. 熊猫并非真正旨在将列表作为值来容纳。

这应该工作,我在第一列中包含字符串

df[0].apply((lambda x: [str(y) for y in x]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM