简体   繁体   English

如何根据价值在熊猫中拆分一列并创建新列?

[英]how to split one column in pandas based on value and create new columns?

I have a data frame like this: 我有一个像这样的数据框:

df1 = pd.DataFrame({
               'testName':   [4402, 4402 ,5555,6753,1234,9876,3602],
               'endResult': ['WARNING', 'WARNING', 'FAILED', 'FAILED','WARNING','FAILED','WARNING'],
               })

I want to achieve this: 我想实现这一目标:

df = pd.DataFrame({
    'testName':[4402, 4402 ,5555,6753,1234,9876,3602],
    'WARNING':[4402,4402,0,0,1234,0,3602],
    'FAILED':[0,0,5555,6753,0,9876,0]
})

How do I do it? 我该怎么做?

Use pivot , like this: 使用pivot ,如下所示:

df = (df1.reset_index()
         .pivot('index', 'endResult', 'testName')
         .fillna(0, downcast='infer')
print(df)
endResult  FAILED  WARNING
index                     
0               0     4402
1               0     4402
2            5555        0
3            6753        0
4               0     1234
5            9876        0
6               0     3602

Or, set_index using MultiIndex.from_arrays and unstack on the last level. 或者, set_index使用MultiIndex.from_arraysunstack上的最后一个级别。

idx = pd.MultiIndex.from_arrays([df1.index, df1.endResult, ])
df = df1.set_index(idx).testName.unstack(fill_value=0)

print(df)
endResult  FAILED  WARNING
0               0     4402
1               0     4402
2            5555        0
3            6753        0
4               0     1234
5            9876        0
6               0     3602

Getting rid of the index while 摆脱索引而
(1) Printing (1)印刷

print(df.to_string(index=False))
FAILED  WARNING
     0     4402
     0     4402
  5555        0
  6753        0
     0     1234
  9876        0
     0     3602

(2) Saving to CSV (2)保存为CSV

df.to_csv('data.csv', index=False)

here is how to do issue: 这是问题的处理方法:

df1 = pd.DataFrame({
    'testName': [4402, 4402, 5555, 6753, 1234, 9876, 3602],
    'endResult': ['WARNING', 'WARNING', 'FAILED', 'FAILED', 'WARNING', 'FAILED', 
'WARNING'],
})
df = df1.where(df1["endResult"] == "FAILED").dropna()
df = df.rename(index=str, columns={"endResult": "FAILED"})
d_f = df1.where(df1["endResult"] == "WARNING").dropna()
d_f = d_f.rename(index=str, columns={"endResult": "WARNING"})
df = df.append(d_f)
df= df.fillna(0)

Solve it using unique() and where() with: 使用unique()和where()解决:

import pandas as pd
df1 = pd.DataFrame({
               'testName':   [4402, 4402 ,5555,6753,1234,9876,3602],
               'endResult': ['WARNING', 'WARNING', 'FAILED', 'FAILED','WARNING','FAILED','WARNING'],
               })


for msg in df1['endResult'].unique():
    df1[msg] = df1['testName'].where(df1['endResult']==msg,other=0)
df1.drop('endResult',axis=1,inplace=True)

print(df1)

     testName  WARNING  FAILED
0      4402     4402       0
1      4402     4402       0
2      5555        0    5555
3      6753        0    6753
4      1234     1234       0
5      9876        0    9876
6      3602     3602       0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM