简体   繁体   English

重新格式化 dataframe

[英]Reformat the dataframe

I need to reformat the dataframe to be framed with users_id as rows and website_id as columns.我需要重新格式化 dataframe 以将 users_id 作为行和 website_id 作为列。

Each user must appear only once on each line and each website_id must only appear in one column.每个用户在每一行中只能出现一次,并且每个 website_id 只能在一列中出现。

| website_id | url               | user_id|
|------------|-------------------|--------|
|123         |www.google.com     |       1|
|234         |www.flamengo.com.br|       3|
|123         |www.google.com     |       4|
|234         |www.flamengo.com.br|       1|
|345         |www.nasa.gov       |      34|

if the user has accessed the website_id I need to fill the 'new' column with 1, otherwise 0.如果用户访问了 website_id 我需要用 1 填充“新”列,否则为 0。

I don't know where to start to reach this goal.我不知道从哪里开始才能达到这个目标。 Final result:最后结果:

|user_id|123|234|345|
|-------|---|---|---|
|1      |1  |1  |0  |
|3      |0  |1  |0  |
|4      |1  |0  |0  |
|34     |0  |0  |1  |

IIUC, you can also try this: IIUC,你也可以试试这个:

df.set_index('user_id')['website_id'].astype(str)\
  .str.get_dummies().groupby(level=0).sum().reset_index()

Output: Output:

   user_id  123  234  345
0        1    1    1    0
1        3    0    1    0
2        4    1    0    0
3       34    0    0    1
out = pd.crosstab(df['user_id'], df['website_id'])

out

        123 234 345
user_id         
1       1   1   0
3       0   1   0
4       1   0   0
34      0   0   1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM