简体   繁体   English

如何在R或Python中将分类数据折叠到单个记录中?

[英]How do I collapse categorical data into a single record in R or Python?

I have a data set structured in this fashion: 我有一个以这种方式构造的数据集:

ID   Code
1     A
1     B   
1     C
2     A
2     C
3     B
3     C

However, I would like it to look like: 但是,我希望它看起来像:

ID  Codes
1   A B C
2   A C
3   B C

Is there an easy way to do this in R or Python? 有没有简单的方法可以在R或Python中做到这一点? Thanks! 谢谢!

In R , you can do R ,您可以执行

aggregate(Code~ID, df1, paste, collapse=' ')
#    ID  Code
#1  1  A B C
#2  2    A C
#3  3    B C

Or 要么

library(data.table)
setDT(df1)[, list(Code=paste(Code, collapse= ' ')), ID]

data 数据

df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L), 
Code = c("A", 
"B", "C", "A", "C", "B", "C")), .Names = c("ID", "Code"),
class =    "data.frame", row.names = c(NA, -7L))

Using data.table : 使用data.table

require(data.table)
ans = setDT(df)[, .(Codes = paste(Code, collapse=" ")), by=ID]
#    ID Codes
# 1:  1 A B C
# 2:  2   A C
# 3:  3   B C

ans$Codes # is a character vector

This'd result in pasting the values, which may not be always the best way. 这将导致粘贴值,但这可能并不总是最好的方法。 Alternatively, you can also have them as a list column.. 或者,您也可以将它们作为列表列。

ans = setDT(df)[, .(Codes = list(Code)), by=ID]
#    ID Codes
# 1:  1 A,B,C
# 2:  2   A,C
# 3:  3   B,C

ans$Codes # is a list

Note that the order of groups will be preserved in aggregated result (which is not evident from this sample data as ID is already sorted). 请注意,组的顺序将保留在汇总结果中(由于ID已排序,因此从此样本数据中看不出来)。

In Python with Pandas you can do: 在带有Pandas的 Python中,您可以执行以下操作:

import pandas as pd

df = pd.read_clipboard() # from your sample

df
   ID Code
0   1    A
1   1    B
2   1    C
3   2    A
4   2    C
5   3    B
6   3    C

df.groupby('ID').agg(lambda x: ' '.join(x['Code']))

     Code
ID       
1   A B C
2     A C
3     B C

In pure Python: 在纯Python中:

>>> ID = [1,1,1,2,2,3,3]
>>> code = ['A','B','C','A','C','B','C']
>>> data = {id:[] for id in set(ID)}
>>> for id, code in zip(ID, code):
...     data.get(id).append(code)
...
>>> data
{1: ['A', 'B', 'C'], 2: ['A', 'C'], 3: ['B', 'C']}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM