按特定条件拆分数据帧但保留原始数据帧

Question

I have a dataframe "bb" like this: 我有一个像这样的数据帧“bb”：

Response                                Unique Count
I love it so much!                      246_0    1
This is not bad, but can be better.     246_1    2
Well done, let's do it.                 247_0    1

If count is lager than 1, I would like to split the string and make the dataframe "bb" become this: (result I expected) 如果count大于1，我想分割字符串并使数据帧“bb”变为:(结果我预期）

Response                                Unique
I love it so much!                      246_0    
This is not bad                         246_1_0    
but can be better.                      246_1_1
Well done, let's do it.                 247_0

My code: 我的代码：

bb = DataFrame(bb[bb['Count'] > 1].Response.str.split(',').tolist(), index=bb[bb['Count'] > 1].Unique).stack()
bb = bb.reset_index()[[0, 'Unique']]
bb.columns = ['Response','Unique']
bb=bb.replace('', np.nan)
bb=bb.dropna()
print(bb)

But the result is like this: 但结果是这样的：

           Response  Unique
0  This is not bad    246_1
1  but can be better. 246_1

How can I keep the original dataframe in this case? 在这种情况下，如何保留原始数据帧？

Answer 1

First split only values per condition with to new helper Series and then add counter values by GroupBy.cumcount only per duplicated index values by Index.duplicated : 首先将每个条件的值除以新的帮助程序Series ，然后仅通过GroupBy.cumcount按重复的索引值按Index.duplicated添加计数器值：

s = df.loc[df.pop('Count') > 1, 'Response'].str.split(',', expand=True).stack()
df1 = df.join(s.reset_index(drop=True, level=1).rename('Response1'))
df1['Response'] = df1.pop('Response1').fillna(df1['Response'])

mask = df1.index.duplicated(keep=False)
df1.loc[mask, 'Unique'] += df1[mask].groupby(level=0).cumcount().astype(str).radd('_')
df1 = df1.reset_index(drop=True)
print (df1)
              Response   Unique
0   I love it so much!    246_0
1      This is not bad  246_1_0
2   but can be better.  246_1_1
3           Well done!    247_0

EDIT: If need _0 for all another values remove mask: 编辑：如果需要_0为所有其他值删除掩码：

s = df.loc[df.pop('Count') > 1, 'Response'].str.split(',', expand=True).stack()
df1 = df.join(s.reset_index(drop=True, level=1).rename('Response1'))
df1['Response'] = df1.pop('Response1').fillna(df1['Response'])

df1['Unique'] += df1.groupby(level=0).cumcount().astype(str).radd('_')
df1 = df1.reset_index(drop=True)
print (df1)
              Response   Unique
0   I love it so much!  246_0_0
1      This is not bad  246_1_0
2   but can be better.  246_1_1
3           Well done!  247_0_0

Answer 2

Step wise we can solve this problem the following: 我们可以逐步解决以下问题：

Split your dataframes by count 按计数拆分数据帧
Use this function to explode the string to rows 使用此函数可将字符串分解为行
We groupby on index and use cumcount to get the correct unique column values. 我们groupby对指数和使用cumcount以获得正确的unique列值。
Finally we concat the dataframes together again. 最后，我们concat又一起dataframes。

df1 = df[df['Count'].ge(2)] # all rows which have a count 2 or higher
df2 = df[df['Count'].eq(1)] # all rows which have count 1

df1 = explode_str(df1, 'Response', ',') # explode the string to rows on comma delimiter

# Create the correct unique column
df1['Unique'] = df1['Unique'] + '_' + df1.groupby(df1.index).cumcount().astype(str)

df = pd.concat([df1, df2]).sort_index().drop('Count', axis=1).reset_index(drop=True)

              Response   Unique
0   I love it so much!    246_0
1      This is not bad  246_1_0
2   but can be better.  246_1_1
3           Well done!    247_0

Function used from linked answer: 链接答案使用的功能：

def explode_str(df, col, sep):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.count(sep) + 1)
    return df.iloc[i].assign(**{col: sep.join(s).split(sep)})

按特定条件拆分数据帧但保留原始数据帧

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-07-08 07:35:12

解决方案2
1 2019-07-08 07:53:28

按特定条件拆分数据帧但保留原始数据帧

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-07-08 07:35:12

解决方案2 1 2019-07-08 07:53:28

解决方案1
3 已采纳 2019-07-08 07:35:12

解决方案2
1 2019-07-08 07:53:28