I often have situations where I have to find the first instance of a value in a table. For example, below I have to find the color of the first instance of each candy_type ordered by sequence:
d = {'candy_type':['A','A','B','B','C','C','C'],'sequence':[2,1,1,2,2,3,1], 'color':['Red','Black','Green','Yellow','Orange','White','Purple']}
df = pd.DataFrame(data=d)
df
+----+--------------+------------+---------+
| | candy_type | sequence | color |
|----+--------------+------------+---------|
| 0 | A | 2 | Red |
| 1 | A | 1 | Black |
| 2 | B | 1 | Green |
| 3 | B | 2 | Yellow |
| 4 | C | 2 | Orange |
| 5 | C | 3 | White |
| 6 | C | 1 | Purple |
+----+--------------+------------+---------+
#sort the dataframe by each candy_type's sequence and reset the index
df_sorted = df.sort_values(['candy_type','sequence']).reset_index(drop=True)
#make the index into a column
df_sorted_index = df_sorted.reset_index(drop=False)
df_sorted_index
+----+---------+--------------+------------+---------+
| | index | candy_type | sequence | color |
|----+---------+--------------+------------+---------|
| 0 | 0 | A | 1 | Black |
| 1 | 1 | A | 2 | Red |
| 2 | 2 | B | 1 | Green |
| 3 | 3 | B | 2 | Yellow |
| 4 | 4 | C | 1 | Purple |
| 5 | 5 | C | 2 | Orange |
| 6 | 6 | C | 3 | White |
+----+---------+--------------+------------+---------+
#find the first instance of each candy type; show the whole row
df_sorted_index.loc[df_sorted_index.groupby('candy_type')['index'].idxmin()]
+----+---------+--------------+------------+---------+
| | index | candy_type | sequence | color |
|----+---------+--------------+------------+---------|
| 0 | 0 | A | 1 | Black |
| 2 | 2 | B | 1 | Green |
| 4 | 4 | C | 1 | Purple |
+----+---------+--------------+------------+---------+
You can use match
:
## Create sorted data.frame
d <- data.frame(
candy_type = c('A','A','B','B','C','C','C'),
sequence = c(2,1,1,2,2,3,1),
color = c('Red','Black','Green','Yellow','Orange','White','Purple')
)
d <- d[order(d[["candy_type"]], d[["sequence"]]), ]
## Works when candy_type is a factor column
## Otherwise, use unique() instead of levels()
first_of_type <- match(levels(d[["candy_type"]]), d[["candy_type"]])
first_of_type
# [1] 1 3 5
d[first_of_type, ]
# candy_type sequence color
# 2 A 1 Black
# 3 B 1 Green
# 7 C 1 Purple
which.min()
is R's the equivalent of idxmin(). Both find the minimum value in an array and return the index of the first such value - useful if there are ties.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.