簡體   English   中英

Pandas Dataframe - 將字符串拆分為多列

[英]Pandas Dataframe - Split string into multiple columns

我是 Pandas 框架的新手,我已經進行了足夠的搜索來解決我的問題,但沒有在網上獲得太多幫助。

我有一個如下所示的字符串列,我想將它轉換成單獨的列。 我的問題是我試過拆分它,但它沒有按照我需要的方式給我 output。

*-----------------------------------------------------------------------------*
|  Total Visitor                                                              |
*-----------------------------------------------------------------------------*
|  2x Adult, 1x Adult + Audio Guide                                           |
|  2x Adult, 2x Youth, 1x Children                                            | 
|  5x Adult + Audio Guide, 1x Children + Audio Guide, 1x Senior + Audio Guide |
*-----------------------------------------------------------------------------*

這是我用來拆分字符串但沒有給我預期的 output 的代碼。

df = data["Total Visitor"].str.split(",", n = 1, expand = True)

拆分字符串后,我的預期 Output應如下表所示:

*----------------------------------------------------------------------------------------------------------------*
|  Adult    | Adult + Audio Guide    | Youth   | Children    | Children + AG        | Senior + AG                                                                       
*----------------------------------------------------------------------------------------------------------------*
|  2x Adult | 1x Adult + Audio Guide |    -    |       -     |    -                    | -  
|
|  2x Adult |          -             |2x Youth | 1x Children |    -                    | -                               
|      -    | 5x Adult + Audio Guide |    -    |      -      |1x Children + Audio Guide| 1x Senior + Audio Guide |
*----------------------------------------------------------------------------------------------------------------*

我怎樣才能做到這一點? 任何幫助或指導都會很棒。

想法是與移除的數字鍵創建詞典的列表xregex - ^\\d+x\\s+^是開始字符串, \\d+是一個或多個整數和\\s+是一種或多種空格),並傳遞給DataFrame構造函數:

import re

L =[dict([(re.sub('^\d+x\s+',"",y),y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')
print (df)
      Adult     Adult + Audio Guide     Youth     Children  \
0  2x Adult  1x Adult + Audio Guide         -            -   
1  2x Adult                       -  2x Youth  1x Children   
2         -  5x Adult + Audio Guide         -            -   

      Children + Audio Guide     Senior + Audio Guide  
0                          -                        -  
1                          -                        -  
2  1x Children + Audio Guide  1x Senior + Audio Guide  

另一個類似的想法是用x分割來自字典鍵的列名:

L = [dict([(y.split('x ')[1], y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')

這是使用熊貓方法的一種方法:

dstack = df['Total Visitor'].str.split(',', expand=True).stack().str.strip().to_frame()
dstack['cols'] = dstack[0].str.extract(r'\d+x\s(.*)')
df_out = dstack.set_index('cols', append=True)[0].reset_index(level=1, drop=True).unstack()
df_out

輸出:

cols     Adult     Adult + Audio Guide     Children     Children + Audio Guide     Senior + Audio Guide     Youth
0     2x Adult  1x Adult + Audio Guide          NaN                        NaN                      NaN       NaN
1     2x Adult                     NaN  1x Children                        NaN                      NaN  2x Youth
2          NaN  5x Adult + Audio Guide          NaN  1x Children + Audio Guide  1x Senior + Audio Guide       NaN

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM