简体   繁体   English

大熊猫将数据从多个列重塑为单个列

[英]Pandas reshaping data from Multiple columns into a single Column

I have a data set that I would like to reshape part of the results. 我有一个数据集,我想重整部分结果。 The data set always starts with the first few columns and is followed by a variable number of columns that group the data. 数据集总是从前几列开始,然后是可变数量的对数据进行分组的列。 If the key belongs to that group, it will be marked by an x. 如果密钥属于该组,则将用x标记。 Each key might belong to multiple groups. 每个密钥可能属于多个组。 It could also be empty. 也可以是空的。 The data structure is like this: 数据结构如下:

Key  Date Added Group1Name Group2Name Group3Name ... GroupXName
1    1/1/2018   x           X
2    1/1/2018               x
3    1/1/2018                          
4    1/1/2018   x 
5    1/1/2018                                         x

I want to reformat as: 我想重新格式化为:

Key  Date Added Group
1    1/1/2018   Group1Name,Group2Name
2    1/1/2018   Group2Name           
3    1/1/2018        
4    1/1/2018   Group1Name
5    1/1/2018   GroupXName

Seems like you havent tried much and it's hard to really reproduce your data with what you provided but the idea is to have the columns have the proper values instead of 'x' and to take the dataframe from wide to long format... 似乎您还没有尝试很多,并且很难用提供的内容来真正地复制数据,但是其想法是让列具有正确的值而不是'x'并将数据框从宽格式转换为长格式...

columns_to_consider = ['Group1Name',  'Group2Name', ... ]
for column in columns_to_consider:
    df[column] = df[column].str.replace('X', column)
reshaped_df = pd.melt(df, id_vars=['Key', 'Date Added'], value_vars=columns_to_consider)

Use apply with axis=1 param: axis=1参数一起使用apply

def group_func(series):
        values = []
        for val, idx in zip(series, series.index.values):
            if val is 'x':
                values += [str(idx)]
        return " ".join(values)

cols_to_agg = ['Group1Name', 'Group2Name', 'Group3Name', 'Group4Name']
df.loc[:,'Group'] = df.loc[:,cols_to_agg].apply(group_func, axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM