如何将熊猫数据框字符串条目拆分为单独的行？

Question

I have a pandas dataframe in which one column of text strings contains new line separated values. 我有一个pandas数据框，其中一串文本字符串包含换行符。 I want to split each CSV field and create a new row per entry. 我想拆分每个CSV字段，并为每个条目创建一个新行。

My Data Frame is like: 我的数据框就像：

Col-1   Col-2
A       Notifications
        Returning Value
        Both
B       mine
        Why Not?

Expected output is: 预期输出为：

Col-1   Col-2
A       Notifications 
A       Returning Value
A       Both
B       mine
B       Why Not?

Answer 1

First replace() string '' with np.nan and then use fillna(method='ffill') : 首先用np.nan replace()字符串'' ，然后使用fillna(method='ffill') ：

df = pd.DataFrame({'Col-1':['A','','','B',''],
                   'Col-2':['Notifications','Returning Value','Both','mine','Why Not?']})
df
    Col-1   Col-2
0   A   Notifications
1       Returning Value
2       Both
3   B   mine
4       Why Not?

df['Col-1'] = df['Col-1'].replace('',np.nan).fillna(method='ffill')
df
    Col-1   Col-2
0   A   Notifications
1   A   Returning Value
2   A   Both
3   B   mine
4   B   Why Not?

Answer 2

Reconstruct second column to flatten series and then just concatenate it with first column: 重建第二列以展平序列，然后将其与第一列连接：

df = pd.DataFrame({'Col-1': ['A', 'B'], 'Col-2': ['Notifications\nReturning Value\nBoth', 'mine\nWhy Not?']})

df representation: df表示形式：

  Col-1                                 Col-2
0     A  Notifications\nReturning Value\nBoth
1     B                        mine\nWhy Not?

Main part: 主要部分：

series = pd.DataFrame(df['Col-2'].str.split('\n').tolist()).stack()
series.index = series.index.droplevel(1)
series.name = 'Col-2'
result = pd.concat([df['Col-1'], series], axis=1)

Result: 结果：

  Col-1            Col-2
0     A    Notifications
1     A  Returning Value
2     A             Both
3     B             mine
4     B         Why Not?

Answer 3

IIUC you want pd.reset_index() 您想要的pd.reset_index()

Assuming your data is stored in a variable called df: 假设您的数据存储在名为df的变量中：

df = df.reset_index().set_index('Col-1')

a dummy example since you're not providing an easy way to create the MultiIndex: 一个虚拟的示例，因为您没有提供创建MultiIndex的简便方法：

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

First  second
bar    one       0.792900
       two      -0.070508
baz    one      -0.599464
       two       0.334504
foo    one       0.835464
       two       1.614845
qux    one       0.674623
       two       1.907550

Now if we want the first column to be the index: 现在，如果我们希望第一列成为索引：

s = s.reset_index().set_index('first')
print(s)


second         0
first                 
bar      one  0.792900
bar      two -0.070508
baz      one -0.599464
baz      two  0.334504
foo      one  0.835464
foo      two  1.614845
qux      one  0.674623
qux      two  1.907550

More info here: Advanced Indexing 此处更多信息：高级索引

如何将熊猫数据框字符串条目拆分为单独的行？

问题描述

3 个解决方案

解决方案1
1 2018-08-20 11:45:40

解决方案2
1 2018-08-20 12:08:49

解决方案3
0 2018-08-20 12:00:19

如何将熊猫数据框字符串条目拆分为单独的行？

问题描述

3 个解决方案

解决方案1 1 2018-08-20 11:45:40

解决方案2 1 2018-08-20 12:08:49

解决方案3 0 2018-08-20 12:00:19

解决方案1
1 2018-08-20 11:45:40

解决方案2
1 2018-08-20 12:08:49

解决方案3
0 2018-08-20 12:00:19