[英]Grouping all rows of a pandas DataFrame(with many columns) with the same value in a given column
I have been searching for hours.I have a DataFrame like so :- 我一直在寻找小时。我有一个像这样的DataFrame:-
col1. col2. col3. col4
row1. a. p u 0
row2. b. q v 1
row3. a. r w 2
row4. d. s x 3
row5. b. t y 4
Now I want to group all this rows by the value of 'col1' so that I get :- 现在,我想将所有这些行按'col1'的值进行分组,以便得到:-
col1. col2. col3. col4
row1. a. p r u w 0,2
row2. b. q t v y 1,4
row3. d. s x 3
Now I found a way where df.groupby('col1)['col2'].apply(' '.join())
would group all rows in 'col2' by the same value of 'col1'.But I am unable to extend the above command such that all rows of all columns are grouped together to get the output mentioned earlier. 现在我找到了一种方法df.groupby('col1)['col2'].apply(' '.join())
将'col2'中的所有行按相同的'col1'值分组。但是我无法扩展上述命令,以便将所有列的所有行组合在一起以获得前面提到的输出。
The above DataFrame is just for illustration.The actual DataFrame includes around 100 rows and columns and all cells store feedbacks except for col1 which stores the name of the item for which the feedback is on.I want to group all columns on the basis of the same items(col1) and then I will be performing sentimental analysis on the DataFrame. 上面的DataFrame只是为了说明。实际的DataFrame包括大约100行和列,并且所有单元格都存储反馈,但col1除外,col1存储了对其进行反馈的项目的名称。我想基于相同的项目(col1),然后我将对DataFrame进行情感分析。
You can use: 您可以使用:
df1 = df.astype(str).groupby('col1').agg(','.join).reset_index()
print (df1)
col1 col2 col3 col4
0 a. p,r u,w 0,2
1 b. q,t v,y 1,4
2 d. s x 3
If need also indices: 如果需要还可以索引:
df1 = df.astype(str).groupby('col1').agg(','.join).reset_index()
df1.index = df.drop_duplicates('col1').index
print (df1)
col1 col2 col3 col4
row1. a. p,r u,w 0,2
row2. b. q,t v,y 1,4
row4. d. s x 3
Explanation : 说明 :
string
s by astype
首先将所有列按astype
为string
s groupby
and aggregate join
by agg
然后groupby
和聚合join
由agg
col1
add drop_duplicates
如果需要,还可以按col1
上的第一个值进行索引,然后添加drop_duplicates
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.