I've got problem with my dataframes.
First dataframe looks like:
id 0 1 2 3
100 0 0 0 0
101 0 0 0 0
102 0 0 0 0
103 0 0 0 0
Second dataframe looks like:
id num
100 1
100 2
100 3
101 0
101 3
102 1
103 2
103 3
And I want to change in the first dataframe zeros to ones in the specific rows represented by "id" in the columns which are presented in the second dataframe in column "num" with specific "id". So in the end I would like to have first dataframe changed to:
id 0 1 2 3
100 0 1 1 1
101 1 0 0 1
102 0 1 0 0
103 0 0 1 1
How can I do that? I know that I can use for loop (which I've already prepared), but my dataframes are very big and it will take about 4 hours to finish. I was thinking about mapping in pandas, but I didn't have a solution.
Best regards
Use get_dummies
with max
by index for indicator values, if need count values use sum
instead max
:
df = pd.get_dummies(df2.set_index('id')['num']).max(level=0)
print (df)
0 1 2 3
id
100 0 1 1 1
101 1 0 0 1
102 0 1 0 0
103 0 0 1 1
If possible more rows or columns in first DataFrame add DataFrame.reindex
:
df = (pd.get_dummies(df.set_index('id')['num']).max(level=0)
.reindex(index=df1.index, columns=df1.columns, fill_value=0))
Naming the first data frame df1
and second one df2
, you can pivot the data frame df2
:
df2['value'] = 1
df1 = df2.pivot_table(index='id', columns='num', values='value', fill_value=0)
Output:
num 0 1 2 3
id
100 0 1 1 1
101 1 0 0 1
102 0 1 0 0
103 0 0 1 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.