[英]How to Split pandas data frame string entry to separate rows?
我有一個pandas數據框,其中一串文本字符串包含換行符。 我想拆分每個CSV字段,並為每個條目創建一個新行。
我的數據框就像:
Col-1 Col-2
A Notifications
Returning Value
Both
B mine
Why Not?
預期輸出為:
Col-1 Col-2
A Notifications
A Returning Value
A Both
B mine
B Why Not?
首先用np.nan
replace()
字符串''
,然后使用fillna(method='ffill')
:
df = pd.DataFrame({'Col-1':['A','','','B',''],
'Col-2':['Notifications','Returning Value','Both','mine','Why Not?']})
df
Col-1 Col-2
0 A Notifications
1 Returning Value
2 Both
3 B mine
4 Why Not?
df['Col-1'] = df['Col-1'].replace('',np.nan).fillna(method='ffill')
df
Col-1 Col-2
0 A Notifications
1 A Returning Value
2 A Both
3 B mine
4 B Why Not?
重建第二列以展平序列,然后將其與第一列連接:
df = pd.DataFrame({'Col-1': ['A', 'B'], 'Col-2': ['Notifications\nReturning Value\nBoth', 'mine\nWhy Not?']})
df
表示形式:
Col-1 Col-2
0 A Notifications\nReturning Value\nBoth
1 B mine\nWhy Not?
主要部分:
series = pd.DataFrame(df['Col-2'].str.split('\n').tolist()).stack()
series.index = series.index.droplevel(1)
series.name = 'Col-2'
result = pd.concat([df['Col-1'], series], axis=1)
結果:
Col-1 Col-2
0 A Notifications
1 A Returning Value
2 A Both
3 B mine
4 B Why Not?
您想要的pd.reset_index()
假設您的數據存儲在名為df的變量中:
df = df.reset_index().set_index('Col-1')
一個虛擬的示例,因為您沒有提供創建MultiIndex的簡便方法:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
First second
bar one 0.792900
two -0.070508
baz one -0.599464
two 0.334504
foo one 0.835464
two 1.614845
qux one 0.674623
two 1.907550
現在,如果我們希望第一列成為索引:
s = s.reset_index().set_index('first')
print(s)
second 0
first
bar one 0.792900
bar two -0.070508
baz one -0.599464
baz two 0.334504
foo one 0.835464
foo two 1.614845
qux one 0.674623
qux two 1.907550
此處更多信息: 高級索引
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.