[英]splitting a column into multiple columns with specific name in pandas dataframe
我有以下數據框:
pri sec
TOM AB,CD,EF
JACK XY,YZ
HARRY FG
NICK KY,NY,SD,EF,FR
我需要以下帶有列名的輸出(基於列“秒”中存在多少個分隔字段):
pri sec sec0 sec1 sec2 sec3 sec4
TOM AB,CD,EF AB CD EF NaN NaN
JACK XY,YZ XY YZ NaN NaN NaN
HARRY FG FG NaN NaN NaN NaN
NICK KY,NY,SD,EF,FR KY NY SD EF ER
我能得到任何建議嗎?
使用join
+ split
+ add_prefix
:
df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec'))
print (df)
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF None None
1 JACK XY,YZ XY YZ None None None
2 HARRY FG FG None None None None
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR
如果需要NaN
添加fillna
:
df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec').fillna(np.nan))
print (df)
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF NaN NaN
1 JACK XY,YZ XY YZ NaN NaN NaN
2 HARRY FG FG NaN NaN NaN NaN
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR
嘗試以下代碼(解釋為注釋)。 它在“秒”列中找到項目的最大長度並相應地創建名稱:
maxlen = max(list(map(lambda x: len(x.split(",")) ,df.sec))) # find max length in 'sec' column
cols = ["sec"+str(x) for x in range(maxlen)] # create new column names
datalist = list(map(lambda x: x.split(","), df.sec)) # create list from entries in "sec"
newdf = pd.DataFrame(data=datalist, columns=cols) # create dataframe of new columns
newdf = pd.concat([df, newdf], axis=1) # add it to original dataframe
print(newdf)
輸出:
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF None None
1 JACK XY,YZ XY YZ None None None
2 HARRY FG FG None None None None
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.