I'm trying to join two different dataframes. I'll explain what I did so far so you'll understand what I have tried. I'm kinda new to python and I'd really appreciate every hint where i can improve my code.
I've got a dataset which looks similiar to this:
cluster, Type
1, M
1, T
1, M
I've grouped the data and did some aggregation. In addition to this I added some columns to the dataset. So my dataframe is looking like this now:
>>> df
cluster, Type, M, T
1, M, 0, 0
1, T, 0, 0
1, M, 0, 0
And the aggregation looks like this:
>>> a
cluster Type, len
1, M, 2
1, T, 1
I want to put ever len from a to the corresponding column in df so the result would be:
>>> df
cluster, Type, M, T
1, M, 2, 0
1, T, 0, 1
What I've tried to do is:
for idx, row in df.iterrows():
c = row['cluster']
t = row['Type']
val = a.loc[
(a['cluster'] == c) &
(a['Type'] == t),
'len'
]
row[t] = val
In the end, it failed because the last line, row[t] didn't get updated. But I have the feeling I'm doing this in a very complicated way.
Any ideas how to do it in an more elegant way?
You can use this to go from 'a' to your expected result using set_index
, unstack
and reset_index
:
df = a.set_index([a.Type,'cluster','Type'])['len']\
.unstack(0).rename_axis(None,axis=1)\
.reset_index()
Output:
cluster Type M T
0 1 M 2.0 NaN
1 1 T NaN 1.0
Here is a way to do it. It still involves a loop, but I think it's clearer and faster than what you were trying to do. It only uses your original df
, no need for the aggregation you provided.
Start by making a dictionary of the length per Type
:
len_dict = df.groupby('Type').size().to_dict()
>>> len_dict
{'M': 2, 'T': 1}
Then drop the duplicates in your original df
, finally looping through the keys in len_dict
and assigning the approriate columns to the respective keys:
df.drop_duplicates(inplace=True)
for t in len_dict:
df.loc[df.Type.eq(t), t] = len_dict[t]
>>> df
cluster Type M T
0 1 M 2 0
1 1 T 0 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.