[英]python - is it possible to concat column after using pandas get_dummies?
here is my example df 这是我的例子df
doc_num
doc1 doc2
A B U123
A C U123
A D U124
B C U126
B D U126
and i have use 我有用
pd.get_dummies(df.doc_num).sort_index(level=0)
to make a vector matrix like this 制作这样的矢量矩阵
U123 U124 U126
doc1 doc2
A B 1 0 0
A C 1 0 0
A D 0 1 0
B C 0 0 1
B D 0 0 1
but i would like to concat the doc1 and doc2 then create a new column to see the expected result like this 但我想连接doc1和doc2然后创建一个新列来查看这样的预期结果
U123 U124 U126
doc_3
A,B 1 0 0
A,C 1 0 0
A,D 0 1 0
B,C 0 0 1
B,D 0 0 1
is it possible? 可能吗? thank you in advance
先感谢您
In addition to @jezrael's answer, you want a vector matrix, so do: 除了@ jezrael的答案之外,你还需要一个矢量矩阵,所以:
df1=pd.get_dummies(df.doc_num)
df1.insert(0, 'doc_3', df['doc1'] + ',' + df['doc2'])
print(df1.set_index('doc_3'))
Or: 要么:
df1=pd.get_dummies(df.doc_num)
df1['doc_3']=df.pop('doc1') + ',' + df.pop('doc2')
print(df1.set_index('doc_3'))
All Output: 所有输出:
U123 U124 U126
doc_3
A,B 1 0 0
A,C 1 0 0
A,D 0 1 0
B,C 0 0 1
B,D 0 0 1
Now you really get your desired output. 现在你真的得到了你想要的输出。
I believe you need join both levels of MultiIndex
, set index name by rename_axis
: 我相信你需要加入两个级别的
MultiIndex
,通过rename_axis
设置索引名称:
df1 = pd.get_dummies(df.doc_num).sort_index(level=0)
df1.index = df1.index.map(','.join)
df1 = df1.rename_axis('doc_3')
print (df1)
U123 U124 U126
doc_3
A,B 1 0 0
A,C 1 0 0
A,D 0 1 0
B,C 0 0 1
B,D 0 0 1
And add reset_index
for column if necessary: 如有必要,
reset_index
为列添加reset_index
:
df1 = df1.reset_index()
print (df1)
doc_3 U123 U124 U126
0 A,B 1 0 0
1 A,C 1 0 0
2 A,D 0 1 0
3 B,C 0 0 1
4 B,D 0 0 1
Or first reset_index
to columns from MultiIndex
with pop
for extract columns if want index: 或者首先将
reset_index
到MultiIndex
列,如果想要索引, MultiIndex
使用pop
提取列:
df1 = pd.get_dummies(df.doc_num).sort_index(level=0).reset_index()
df1.index = df1.pop('doc1') + ',' + df1.pop('doc2')
df1 = df1.rename_axis('doc_3')
print (df1)
U123 U124 U126
doc_3
A,B 1 0 0
A,C 1 0 0
A,D 0 1 0
B,C 0 0 1
B,D 0 0 1
Or use insert
for new column: 或者使用
insert
作为新列:
df1 = pd.get_dummies(df.doc_num).sort_index(level=0).reset_index()
df1.insert(0, 'doc_3', df1.pop('doc1') + ',' + df1.pop('doc2'))
print (df1)
doc_3 U123 U124 U126
0 A,B 1 0 0
1 A,C 1 0 0
2 A,D 0 1 0
3 B,C 0 0 1
4 B,D 0 0 1
You can try below code. 你可以尝试下面的代码。 It will combine two columns into one .
它将两列合二为一。 Also, add "," in between them.
另外,在它们之间添加“,”。
df['doc_3'] = df['doc1'] + "," + df['doc2']
Then you can drop first two columns 然后你可以先删除两列
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.