简体   繁体   English

python - 使用pandas get_dummies后可以连接列吗?

[英]python - is it possible to concat column after using pandas get_dummies?

here is my example df 这是我的例子df

         doc_num
doc1 doc2 
 A    B    U123
 A    C    U123
 A    D    U124
 B    C    U126
 B    D    U126

and i have use 我有用

pd.get_dummies(df.doc_num).sort_index(level=0)

to make a vector matrix like this 制作这样的矢量矩阵

           U123 U124 U126
doc1 doc2  
 A    B     1    0    0
 A    C     1    0    0
 A    D     0    1    0
 B    C     0    0    1
 B    D     0    0    1

but i would like to concat the doc1 and doc2 then create a new column to see the expected result like this 但我想连接doc1和doc2然后创建一个新列来查看这样的预期结果

       U123 U124 U126
doc_3  
 A,B     1    0    0
 A,C     1    0    0
 A,D     0    1    0
 B,C     0    0    1
 B,D     0    0    1

is it possible? 可能吗? thank you in advance 先感谢您

In addition to @jezrael's answer, you want a vector matrix, so do: 除了@ jezrael的答案之外,你还需要一个矢量矩阵,所以:

df1=pd.get_dummies(df.doc_num)
df1.insert(0, 'doc_3',  df['doc1'] + ',' + df['doc2'])
print(df1.set_index('doc_3'))

Or: 要么:

df1=pd.get_dummies(df.doc_num)
df1['doc_3']=df.pop('doc1') + ',' + df.pop('doc2')
print(df1.set_index('doc_3'))

All Output: 所有输出:

       U123  U124  U126
doc_3                  
A,B       1     0     0
A,C       1     0     0
A,D       0     1     0
B,C       0     0     1
B,D       0     0     1

Now you really get your desired output. 现在你真的得到了你想要的输出。

I believe you need join both levels of MultiIndex , set index name by rename_axis : 我相信你需要加入两个级别的MultiIndex ,通过rename_axis设置索引名称:

df1 = pd.get_dummies(df.doc_num).sort_index(level=0)
df1.index = df1.index.map(','.join)
df1 = df1.rename_axis('doc_3')
print (df1)
       U123  U124  U126
doc_3                  
A,B       1     0     0
A,C       1     0     0
A,D       0     1     0
B,C       0     0     1
B,D       0     0     1

And add reset_index for column if necessary: 如有必要, reset_index为列添加reset_index

df1 = df1.reset_index()
print (df1)
  doc_3  U123  U124  U126
0   A,B     1     0     0
1   A,C     1     0     0
2   A,D     0     1     0
3   B,C     0     0     1
4   B,D     0     0     1

Or first reset_index to columns from MultiIndex with pop for extract columns if want index: 或者首先将reset_indexMultiIndex列,如果想要索引, MultiIndex使用pop提取列:

df1 = pd.get_dummies(df.doc_num).sort_index(level=0).reset_index()
df1.index =  df1.pop('doc1') + ',' + df1.pop('doc2')
df1 = df1.rename_axis('doc_3')
print (df1)
       U123  U124  U126
doc_3                  
A,B       1     0     0
A,C       1     0     0
A,D       0     1     0
B,C       0     0     1
B,D       0     0     1

Or use insert for new column: 或者使用insert作为新列:

df1 = pd.get_dummies(df.doc_num).sort_index(level=0).reset_index()
df1.insert(0, 'doc_3',  df1.pop('doc1') + ',' + df1.pop('doc2'))

print (df1)
  doc_3  U123  U124  U126
0   A,B     1     0     0
1   A,C     1     0     0
2   A,D     0     1     0
3   B,C     0     0     1
4   B,D     0     0     1

You can try below code. 你可以尝试下面的代码。 It will combine two columns into one . 它将两列合二为一。 Also, add "," in between them. 另外,在它们之间添加“,”。

df['doc_3'] = df['doc1'] + "," + df['doc2']

Then you can drop first two columns 然后你可以先删除两列

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM