简体   繁体   English

拆分列并分配给同一数据框

[英]Splitting a column and assigning to the same data frame

I have a dataset for movies recommendation and want to separate the genre feature into two genre columns(genre_1,genre_2),and assign it into the same dataframe. 我有一个电影推荐数据集,想要将流派功能分为两个流派列(流派_1,流派_2),然后将其分配到同一数据帧中。 The column has all the genres together and separates them with '|'. 该列将所有流派放在一起,并用“ |”分隔。 If it is not having two genres then genre_1 need to be assigned to genre_2. 如果没有两个流派,则需要将genre_1分配给genre_2。

What is the best way to do it? 最好的方法是什么?

     movieId      title                                genres
0       1         Toy Story (1995)                     Adventure|Animation|Children|Comedy|Fantasy
1       2         Jumanji (1995)                       Adventure|Children|Fantasy
2       3         Grumpier Old Men (1995)              Comedy|Romance
3       4         Waiting to Exhale (1995)             Comedy|Drama|Romance
4       5         Father of the Bride Part II (1995)   Comedy

Thanks 谢谢

The split function will take apart that string when given '|' 当给定'|'时,split函数将拆分该字符串 as the separator. 作为分隔符。 Pro tip: keeping the genres as a list will work much better than keeping them as two variables; 专家提示:将类型保留为列表比将其保留为两个变量要好得多; you can iterate over the list instead of naming each variable, and if some flick is counted as more than two genres, you're home free. 您可以遍历列表,而不必命名每个变量,并且如果某些轻弹被算作两种以上的流派,那么您将无家可归。

Like suggested in the comment, you should provide an example of the output you're looking for, it's not completely clear from your question. 就像评论中建议的那样,您应该提供所需输出的示例,但您的问题尚不清楚。

Anyway, you can split the genre list into separate columns using: 无论如何,您可以使用以下方法将类型列表拆分为单独的列:

df['genres'].str.split('|',expand=True)

eg: 例如:

df['genres']
Out[13]: 
0    Adventure|Animation|Children|Comedy|Fantasy
1                     Adventure|Children|Fantasy
2                                 Comedy|Romance
3                           Comedy|Drama|Romance
4                                         Comedy


df['genres'].str.split('|',expand=True)
Out[14]: 
           0          1         2       3        4
0  Adventure  Animation  Children  Comedy  Fantasy
1  Adventure   Children   Fantasy    None     None
2     Comedy    Romance      None    None     None
3     Comedy      Drama   Romance    None     None
4     Comedy       None      None    None     None

.str tells pandas to treat that column as a string, and then you have most Python string manipulation methods available. .str告诉pandas将该列视为字符串,然后您可以使用大多数Python字符串操作方法。

expand = True causes each "split" to be stored in a separate column. expand = True导致每个“拆分”存储在单独的列中。

Thanks for the replies, i have solved this problem in the following way. 感谢您的答复,我已经通过以下方式解决了这个问题。 (got help from another friend.) (得到另一个朋友的帮助。)

    df['genre_1'],df['genre_2'],df['genre_3'] = df.genres.str.split('|',2).str
    df['genre_2'] = df['genre_2'].fillna(df['genre_1'])
    df= df.drop('genre_3',axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM