[英]Convert a .csv file into a specified style with Python by jupyter
The file looks like:该文件如下所示:
The first column is user_id
, and the second is the rating for joke 1
, and the rest can be done in the same manner.第一列是user_id
,第二列是joke 1
的评分,rest 可以用同样的方式完成。 I want to convert the file shown above into the format likes:我想将上面显示的文件转换为如下格式:
user_id | joke_id | rating
--------------------------
1 | 1 | -7.82
1 | 2 | 8.79
In addition, after conversion, as the normal ratings are between -10 and +10, the number 99 means the user didn't rate for the corresponding jokes and should be removed.此外,转换后,由于正常评分在-10到+10之间,数字99表示用户没有为相应的笑话评分,应该被删除。
Your question involves several steps, please avoid mixing all questions into one.您的问题涉及多个步骤,请避免将所有问题混为一谈。 Based on your question, following steps would be helpful:根据您的问题,以下步骤会有所帮助:
csv
file using pandas
使用pandas
重新读取csv
文件import pandas as pd
raw = pd.read_csv('PATH-TO-FILE')
melt
to unpivot the DataFrame
使用melt
DataFrame
Because only the image is provided, will use a sample DataFrame instead.由于仅提供图像,因此将使用示例 DataFrame 代替。
raw = pd.DataFrame([[1, -7, 8, 99], [2, 4, 0, 6]], columns = ['user_id', 'joke_1', 'joke_2', 'joke_3'])
user_id joke_1 joke_2 joke_3
0 1 -7 8 99
1 2 4 0 6
Unpivot DataFrame using melt
:使用melt
取消旋转 DataFrame :
df = pd.melt(raw, id_vars=['user_id'], value_vars=['joke_1', 'joke_2', 'joke_3'], var_name='joke', value_name='rating')
user_id joke rating
0 1 joke_1 -7
1 2 joke_1 4
2 1 joke_2 8
3 2 joke_2 0
4 1 joke_3 99
5 2 joke_3 6
With pandas
, you can easily filter DataFrame with condition:使用pandas
,您可以轻松过滤 DataFrame 条件:
df_processed = df[df.rating != 99].reset_index(drop=True)
Please notice the reset_index()
is used to clean the index, not related to your question.请注意reset_index()
用于清理索引,与您的问题无关。
Hope above would help.以上希望对您有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.