简体   繁体   中英

multiple columns from a file into a single column of lists in pandas

I'm new to pandas , and need to prepare a table using pandas , imitating exact function performed by following code snippet:

with open(r'D:/DataScience/ml-100k/u.item') as f:
    temp=''
    for line in f:
        fields = line.rstrip('\n').split('|')
        movieId = int(fields[0])
        name = fields[1]
        geners = fields[5:25]
        geners = map(int, geners)

My question is how to add a geners column in pandas having same : geners = fields[5:25]

It's not clear to me what you intend to accomplish -- a single genres column containing fields 5-25 concatenated? Or separate genre columns for fields 5-25?

For the latter, you can use [pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) :

import pandas as pd

cols = ['movieId', 'name'] + ['genre_' + str(i) for i in range(5, 25)]
df = pd.read_csv(r'D:/DataScience/ml-100k/u.item', delimiter='|', names=cols)

For the former, you could concatenate the genres into say, a space-separated list, using:

df['genres'] = df[cols[2:]].apply(lambda x: ' '.join(x), axis=1)
df.drop(cols[2:], axis=1, inplace=True) # drop the separate genre_N columns

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM