拆分列並分配給同一數據框

Question

我有一個電影推薦數據集，想要將流派功能分為兩個流派列（流派_1，流派_2），然后將其分配到同一數據幀中。 該列將所有流派放在一起，並用“ |”分隔。 如果沒有兩個流派，則需要將genre_1分配給genre_2。

最好的方法是什么？

     movieId      title                                genres
0       1         Toy Story (1995)                     Adventure|Animation|Children|Comedy|Fantasy
1       2         Jumanji (1995)                       Adventure|Children|Fantasy
2       3         Grumpier Old Men (1995)              Comedy|Romance
3       4         Waiting to Exhale (1995)             Comedy|Drama|Romance
4       5         Father of the Bride Part II (1995)   Comedy

謝謝

Answer 1

當給定'|'時，split函數將拆分該字符串 作為分隔符。 專家提示：將類型保留為列表比將其保留為兩個變量要好得多； 您可以遍歷列表，而不必命名每個變量，並且如果某些輕彈被算作兩種以上的流派，那么您將無家可歸。

Answer 2

就像評論中建議的那樣，您應該提供所需輸出的示例，但您的問題尚不清楚。

無論如何，您可以使用以下方法將類型列表拆分為單獨的列：

df['genres'].str.split('|',expand=True)

例如：

df['genres']
Out[13]: 
0    Adventure|Animation|Children|Comedy|Fantasy
1                     Adventure|Children|Fantasy
2                                 Comedy|Romance
3                           Comedy|Drama|Romance
4                                         Comedy


df['genres'].str.split('|',expand=True)
Out[14]: 
           0          1         2       3        4
0  Adventure  Animation  Children  Comedy  Fantasy
1  Adventure   Children   Fantasy    None     None
2     Comedy    Romance      None    None     None
3     Comedy      Drama   Romance    None     None
4     Comedy       None      None    None     None

.str告訴pandas將該列視為字符串，然后您可以使用大多數Python字符串操作方法。

expand = True導致每個“拆分”存儲在單獨的列中。

Answer 3

感謝您的答復，我已經通過以下方式解決了這個問題。 （得到另一個朋友的幫助。）

    df['genre_1'],df['genre_2'],df['genre_3'] = df.genres.str.split('|',2).str
    df['genre_2'] = df['genre_2'].fillna(df['genre_1'])
    df= df.drop('genre_3',axis=1)

拆分列並分配給同一數據框

問題描述

3 個解決方案

解決方案1
0 2018-10-14 11:24:19

解決方案2
0 2018-10-14 11:28:04

解決方案3
0 已采納 2018-11-01 13:19:53

拆分列並分配給同一數據框

問題描述

3 個解決方案

解決方案1 0 2018-10-14 11:24:19

解決方案2 0 2018-10-14 11:28:04

解決方案3 0 已采納 2018-11-01 13:19:53

解決方案1
0 2018-10-14 11:24:19

解決方案2
0 2018-10-14 11:28:04

解決方案3
0 已采納 2018-11-01 13:19:53