The file looks like:
The first column is user_id
, and the second is the rating for joke 1
, and the rest can be done in the same manner. I want to convert the file shown above into the format likes:
user_id | joke_id | rating
--------------------------
1 | 1 | -7.82
1 | 2 | 8.79
In addition, after conversion, as the normal ratings are between -10 and +10, the number 99 means the user didn't rate for the corresponding jokes and should be removed.
Your question involves several steps, please avoid mixing all questions into one. Based on your question, following steps would be helpful:
csv
file using pandas
import pandas as pd
raw = pd.read_csv('PATH-TO-FILE')
melt
to unpivot the DataFrame
Because only the image is provided, will use a sample DataFrame instead.
raw = pd.DataFrame([[1, -7, 8, 99], [2, 4, 0, 6]], columns = ['user_id', 'joke_1', 'joke_2', 'joke_3'])
user_id joke_1 joke_2 joke_3
0 1 -7 8 99
1 2 4 0 6
Unpivot DataFrame using melt
:
df = pd.melt(raw, id_vars=['user_id'], value_vars=['joke_1', 'joke_2', 'joke_3'], var_name='joke', value_name='rating')
user_id joke rating
0 1 joke_1 -7
1 2 joke_1 4
2 1 joke_2 8
3 2 joke_2 0
4 1 joke_3 99
5 2 joke_3 6
With pandas
, you can easily filter DataFrame with condition:
df_processed = df[df.rating != 99].reset_index(drop=True)
Please notice the reset_index()
is used to clean the index, not related to your question.
Hope above would help.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.