简体   繁体   English

将熊猫数据框中的列表转换为列

[英]Converting list in panda dataframe into columns

city        state   neighborhoods       categories
Dravosburg  PA      [asas,dfd]          ['Nightlife']
Dravosburg  PA      [adad]              ['Auto_Repair','Automotive']

I have above dataframe I want to convert each element of a list into column for eg: 我上面有数据框,我想将列表的每个元素转换为列,例如:

city        state asas dfd adad Nightlife Auto_Repair Automotive 
Dravosburg  PA    1     1   0   1         1           0    

I am using following code to do this : 我正在使用以下代码来做到这一点:

def list2columns(df):
"""
to convert list in the columns 
of a dataframe
"""
columns=['categories','neighborhoods']
for col in columns:    
    for i in range(len(df)):
        for element in eval(df.loc[i,"categories"]):
            if len(element)!=0:
                if element not in df.columns:
                    df.loc[:,element]=0
                else:
                    df.loc[i,element]=1
  1. How to do this in more efficient way? 如何以更有效的方式做到这一点?
  2. Why still there is below warning when I am using df.loc already 当我已经在使用df.loc时,为什么仍然在警告下方?

     SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.Try using .loc[row_indexer,col_indexer] = value instead 

Use this instead 改用这个

def list2columns(df):
    """
    to convert list in the columns 
    of a dataframe
    """
    df = df.copy()
    columns=['categories','neighborhoods']
    for col in columns:    
        for i in range(len(df)):
            for element in eval(df.loc[i,"categories"]):
                if len(element)!=0:
                    if element not in df.columns:
                        df.loc[:,element]=0
                    else:
                        df.loc[i,element]=1
    return df

Since you're using eval() , I assume each column has a string representation of a list, rather than a list itself. 由于您使用的是eval() ,因此我假设每一列都有列表的字符串表示形式,而不是列表本身。 Also, unlike your example above, I'm assuming there are quotes around the items in the lists in your neighborhoods column ( df.iloc[0, 'neighborhoods'] == "['asas','dfd']" ), because otherwise your eval() would fail. 另外,与上面的示例不同,我假设您的neighborhoods列中的列表中的项目周围有引号( df.iloc[0, 'neighborhoods'] == "['asas','dfd']" ),因为否则您的eval()将会失败。

If this is all correct, you could try something like this: 如果一切正确,则可以尝试以下操作:

def list2columns(df):
"""
to convert list in the columns of a dataframe
"""
columns = ['categories','neighborhoods']
new_cols = set()      # list of all new columns added
for col in columns:    
    for i in range(len(df[col])):
        # get the list of columns to set
        set_cols = eval(df.iloc[i, col])
        # set the values of these columns to 1 in the current row
        # (if this causes new columns to be added, other rows will get nans)
        df.iloc[i, set_cols] = 1
        # remember which new columns have been added
        new_cols.update(set_cols)
# convert any un-set values in the new columns to 0
df[list(new_cols)].fillna(value=0, inplace=True)
# if that doesn't work, this may:
# df.update(df[list(new_cols)].fillna(value=0))

I can only speculate on an answer to your second question, about the SettingWithCopy warning. 我只能推测第二个问题的答案,即SettingWithCopy警告。

It's possible (but unlikely) that using df.iloc instead of df.loc will help, since that is intended to select by row number (in your case, df.loc[i, col] only works because you haven't set an index, so pandas uses the default index, which matches the row number). 可能(但不太可能)使用df.iloc而不是df.loc会有所帮助,因为这是按行号进行选择的(在您的情况下, df.loc[i, col]仅适用于未设置索引,因此pandas使用默认索引,该索引与行号匹配)。

Another possibility is that the df that is passed in to your function is already a slice from a larger dataframe, and that is causing the SettingWithCopy warning. 另一种可能性是,传递给函数的df已经是来自较大数据帧的切片,这会导致SettingWithCopy警告。

I've also found that using df.loc with mixed indexing modes (logical selectors for rows and column names for columns) produces the SettingWithCopy warning; 我还发现,将df.loc与混合索引模式(行的逻辑选择器和列的列名称)一起使用会产生SettingWithCopy警告; it's possible that your slice selectors are causing similar problems. 您的切片选择器可能会引起类似的问题。

Hopefully the simpler and more direct indexing in the code above will solve any of these problems. 希望上面的代码中更简单,更直接的索引能够解决所有这些问题。 But please report back (and provide code to generate df ) if you are still seeing that warning. 但是,如果您仍然看到该警告,请进行报告(并提供生成df代码)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM