简体   繁体   English

在Python中将一个字符串列拆分为多个列

[英]split one string column to multiple columns in Python

I have a following dataframe: 我有以下数据帧:

df = pd.DataFrame({'scene':[{"living":"0.515","kitchen":"0.297"}, {"kitchen":"0.401","study":"0.005"}, {"study":"0.913"}, {}, {"others":"0"}], 'id':[1, 2, 3 ,4, 5]}) 

id        scene
01      {"living":"0.515","kitchen":"0.297"}
02      {"kitchen":"0.401","study":"0.005"}
03      {"study":"0.913"}
04      {}
05      {"others":"0"}

and I want to create a new dataframe as shown below, can someone help me to create this using Pandas? 我想创建一个新的数据帧,如下所示,有人可以帮我用Pandas创建吗?

id      living     kitchen     study     others
01      0.515       0.297        0         0 
02        0         0.401      0.005       0
03        0           0        0.913       0
04        0           0          0         0 
05        0           0          0         0

Simple solution is to convert your scene column to the list of dictionaries and create new data frame with default constructor: 简单的解决方案是将scene列转换为字典列表,并使用默认构造函数创建新数据框:

pd.DataFrame(df.scene.tolist()).fillna(0)

Result: 结果:

  kitchen living others  study
0   0.297  0.515      0      0
1   0.401      0      0  0.005
2       0      0      0  0.913
3       0      0      0      0
4       0      0      0      0

One of the "default" way to create DataFrame is to use a list of dictionaries. 创建DataFrame的“默认”方法之一是使用字典列表。 In this case each dictionary of list will be converted to the separate row and each key of dict will be used for the column heading. 在这种情况下,列表的每个字典将被转换为单独的行,并且dict的每个键将用于列标题。

On your data, 关于你的数据,

df = pd.DataFrame({'scene':[{"living":"0.515","kitchen":"0.297"}, {"kitchen":"0.401","study":"0.005"}, 
                        {"study":"0.913"}, {}, {"others":"0"}], 
               'id':[1, 2, 3 ,4,5], 's': ['a','b','c','d','e']})

df:
    id  s   scene
0   1   a   {'kitchen': '0.297', 'living': '0.515'}
1   2   b   {'kitchen': '0.401', 'study': '0.005'}
2   3   c   {'study': '0.913'}
3   4   d   {}
4   5   e   {'others': '0'}

There are two ways you can go about doing this, 有两种方法可以做到这一点,

  1. In a single line, where you have to input all column names except 'scene' to set_index function 在一行中,您必须将除“scene”之外的所有列名称输入到set_index函数

     df = df.set_index(['id', 's'])['scene'].apply(pd.Series).fillna(0).reset_index() 

    which will output: 这将输出:

      id s kitchen living study others 0 1 a 0.297 0.515 0 0 1 2 b 0.401 0 0.005 0 2 3 c 0 0 0.913 0 3 4 d 0 0 0 0 4 5 e 0 0 0 0 
  2. In two lines, where you create your excepted result and concat it to the original dataframe. 在两行中,您可以在其中创建例外结果并将其连接到原始数据框。

     df1 = df.scene.apply(pd.Series).fillna(0) df = pd.concat([df, df1], axis=1) 

    which gives, 这使,

      id s scene kitchen living study others 0 1 a {'kitchen': '0.297', 'living': '0.515'} 0.297 0.515 0 0 1 2 b {'kitchen': '0.401', 'study': '0.005'} 0.401 0 0.005 0 2 3 c {'study': '0.913'} 0 0 0.913 0 3 4 d {} 0 0 0 0 4 5 e {'others': '0'} 0 0 0 0 

Updated. 更新。 This one works perfectly. 这一个很完美。 Welcome to give your suggestions to keep it more concise. 欢迎提出您的建议,使其更简洁。

import json
import pandas as pd

df = pd.DataFrame({'scene':[{"living":"0.515","kitchen":"0.297"}, {"kitchen":"0.401","study":"0.005"}, {"study":"0.913"}, {}, {"others":"0"}], 'id':[1, 2, 3 ,4,5], 's':['a','b','c','d','e']}) 
def test(Scene, type):
    Scene = json.loads(Scene)
    if type in Scene.keys():
        return Scene[type]
    else:
        return ""

a = ['living', 'kitchen', 'study', 'others']
for b in a:
    df[b] = df['Scene'].map(lambda Scene: test(Scene, b.lower()))

cols = ['living', 'kitchen', 'study', 'others']
df[cols] = df[cols].replace({'': 0})
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)

完美的一线解决方案就在这里,感谢所有帮助:

df.join(df['scene'].apply(json.loads).apply(pd.Series))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM