简体   繁体   中英

Convert a .csv file into a specified style with Python by jupyter

The file looks like:该文件看起来像

The first column is user_id , and the second is the rating for joke 1 , and the rest can be done in the same manner. I want to convert the file shown above into the format likes:

user_id | joke_id | rating
--------------------------
   1    |   1     | -7.82
   1    |   2     | 8.79

In addition, after conversion, as the normal ratings are between -10 and +10, the number 99 means the user didn't rate for the corresponding jokes and should be removed.

Your question involves several steps, please avoid mixing all questions into one. Based on your question, following steps would be helpful:

  • Reead csv file using pandas
import pandas as pd
raw = pd.read_csv('PATH-TO-FILE')
  • Use melt to unpivot the DataFrame

Because only the image is provided, will use a sample DataFrame instead.

raw = pd.DataFrame([[1, -7, 8, 99], [2, 4, 0, 6]], columns = ['user_id', 'joke_1', 'joke_2', 'joke_3'])
   user_id  joke_1  joke_2  joke_3
0        1      -7       8      99
1        2       4       0       6

Unpivot DataFrame using melt :

df = pd.melt(raw, id_vars=['user_id'], value_vars=['joke_1', 'joke_2', 'joke_3'], var_name='joke', value_name='rating')
   user_id    joke  rating
0        1  joke_1      -7
1        2  joke_1       4
2        1  joke_2       8
3        2  joke_2       0
4        1  joke_3      99
5        2  joke_3       6
  • Filter the Dataframe with condition

With pandas , you can easily filter DataFrame with condition:

df_processed = df[df.rating != 99].reset_index(drop=True)

Please notice the reset_index() is used to clean the index, not related to your question.

Hope above would help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM