[英]How to Split pandas data frame string entry to separate rows?
I have a pandas dataframe in which one column of text strings contains new line separated values. 我有一个pandas数据框,其中一串文本字符串包含换行符。 I want to split each CSV field and create a new row per entry. 我想拆分每个CSV字段,并为每个条目创建一个新行。
My Data Frame is like: 我的数据框就像:
Col-1 Col-2
A Notifications
Returning Value
Both
B mine
Why Not?
Expected output is: 预期输出为:
Col-1 Col-2
A Notifications
A Returning Value
A Both
B mine
B Why Not?
First replace()
string ''
with np.nan
and then use fillna(method='ffill')
: 首先用np.nan
replace()
字符串''
,然后使用fillna(method='ffill')
:
df = pd.DataFrame({'Col-1':['A','','','B',''],
'Col-2':['Notifications','Returning Value','Both','mine','Why Not?']})
df
Col-1 Col-2
0 A Notifications
1 Returning Value
2 Both
3 B mine
4 Why Not?
df['Col-1'] = df['Col-1'].replace('',np.nan).fillna(method='ffill')
df
Col-1 Col-2
0 A Notifications
1 A Returning Value
2 A Both
3 B mine
4 B Why Not?
Reconstruct second column to flatten series and then just concatenate it with first column: 重建第二列以展平序列,然后将其与第一列连接:
df = pd.DataFrame({'Col-1': ['A', 'B'], 'Col-2': ['Notifications\nReturning Value\nBoth', 'mine\nWhy Not?']})
df
representation: df
表示形式:
Col-1 Col-2
0 A Notifications\nReturning Value\nBoth
1 B mine\nWhy Not?
Main part: 主要部分:
series = pd.DataFrame(df['Col-2'].str.split('\n').tolist()).stack()
series.index = series.index.droplevel(1)
series.name = 'Col-2'
result = pd.concat([df['Col-1'], series], axis=1)
Result: 结果:
Col-1 Col-2
0 A Notifications
1 A Returning Value
2 A Both
3 B mine
4 B Why Not?
IIUC you want pd.reset_index()
您想要的pd.reset_index()
Assuming your data is stored in a variable called df: 假设您的数据存储在名为df的变量中:
df = df.reset_index().set_index('Col-1')
a dummy example since you're not providing an easy way to create the MultiIndex: 一个虚拟的示例,因为您没有提供创建MultiIndex的简便方法:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
First second
bar one 0.792900
two -0.070508
baz one -0.599464
two 0.334504
foo one 0.835464
two 1.614845
qux one 0.674623
two 1.907550
Now if we want the first column to be the index: 现在,如果我们希望第一列成为索引:
s = s.reset_index().set_index('first')
print(s)
second 0
first
bar one 0.792900
bar two -0.070508
baz one -0.599464
baz two 0.334504
foo one 0.835464
foo two 1.614845
qux one 0.674623
qux two 1.907550
More info here: Advanced Indexing 此处更多信息: 高级索引
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.