[英]Reformat the dataframe
I need to reformat the dataframe to be framed with users_id as rows and website_id as columns.我需要重新格式化 dataframe 以将 users_id 作为行和 website_id 作为列。
Each user must appear only once on each line and each website_id must only appear in one column.每个用户在每一行中只能出现一次,并且每个 website_id 只能在一列中出现。
| website_id | url | user_id|
|------------|-------------------|--------|
|123 |www.google.com | 1|
|234 |www.flamengo.com.br| 3|
|123 |www.google.com | 4|
|234 |www.flamengo.com.br| 1|
|345 |www.nasa.gov | 34|
if the user has accessed the website_id I need to fill the 'new' column with 1, otherwise 0.如果用户访问了 website_id 我需要用 1 填充“新”列,否则为 0。
I don't know where to start to reach this goal.我不知道从哪里开始才能达到这个目标。 Final result:
最后结果:
|user_id|123|234|345|
|-------|---|---|---|
|1 |1 |1 |0 |
|3 |0 |1 |0 |
|4 |1 |0 |0 |
|34 |0 |0 |1 |
IIUC, you can also try this: IIUC,你也可以试试这个:
df.set_index('user_id')['website_id'].astype(str)\
.str.get_dummies().groupby(level=0).sum().reset_index()
Output: Output:
user_id 123 234 345
0 1 1 1 0
1 3 0 1 0
2 4 1 0 0
3 34 0 0 1
out = pd.crosstab(df['user_id'], df['website_id'])
out
123 234 345
user_id
1 1 1 0
3 0 1 0
4 1 0 0
34 0 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.