I have been struggling with a pandas quest for a while now and maybe someone can shed some new light into this problem :)
Consider de following pandas dataframe, df :
Year Month Task TaskID TaskClass TaskClassID SomeValue
2019 11 A 1 X 10 6.58
2019 11 A 1 Y 20 1.58
2019 11 B 2 X 10 6.58
2019 11 B 2 Y 20 1.58
objective: group by Task in a way that each Task gets a unique TaskClass observation (which Tasks gets a TaskClass is not important for this problem, can be considered random). like this:
Year Month Task TaskID TaskClass TaskClassID SomeValue
2019 11 A 1 X 10 6.58
2019 11 B 2 Y 20 1.58
or, for instance, this:
Year Month Task TaskID TaskClass TaskClassID SomeValue
2019 11 A 1 Y 20 1.58
2019 11 B 2 X 10 6.58
other constraints the final problema will have thousands of tasks and, more important, can have more TaskClass per Task , something like this:
Year Month Task TaskID TaskClass TaskClassID SomeValue
2019 11 A 1 X 10 6.58
2019 11 A 1 Y 20 1.58
2019 11 A 1 Z 30 1.00
2019 11 A 1 W 40 0.25
2019 11 B 2 X 10 6.58
2019 11 B 2 Y 20 1.58
2019 11 B 2 Z 30 1.00
2019 11 B 2 W 40 0.25
Thank you all, in advance.
Why not use drop duplicates?
More here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html
Assume a dataframe like so:
data = pd.DataFrame({
'Task Class': ['x', 'x', 'y', 'z', 'y', 'z'],
'Value' : [1, 2, 3, 4, 5, 6],
})
Task Class Value
0 x 1
1 x 2
2 y 3
3 z 4
4 y 5
5 z 6
We can do:
data.drop_duplicates(['Task Class'], inplace=True)
And get:
Task Class Value
0 x 1
2 y 3
3 z 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.