I've got a pandas dataframe that is structured as such,
ID Col1 Col2
1 50 12:23:01
1 34 12:25:11
1 65 12:32:25
1 98 12:45:08
2 23 11:09:10
2 12 11:12:43
2 56 11:13:12
2 34 11:14:26
2 77 11:16:02
3 64 14:01:11
3 34 14:01:13
3 48 14:02:32
What I need is to be able to search within a repeating ID
value to find a condition in column 1, say Col1==34
. Based on this, I need to create a new column, Col3
, that takes on the corresponding value in Col2
. The end result I need is shown below.
ID Col1 Col2 Col3
1 50 12:23:01 12:25:11
1 34 12:25:11 12:25:11
1 65 12:32:25 12:25:11
1 98 12:45:08 12:25:11
2 23 11:09:10 11:14:26
2 12 11:12:43 11:14:26
2 56 11:13:12 11:14:26
2 34 11:14:26 11:14:26
2 77 11:16:02 11:14:26
3 64 14:01:11 14:01:13
3 34 14:01:13 14:01:13
3 48 14:02:32 14:01:13
I've tried the following, but it's not pulling the distinct Col2
value, rather it's just duplicating Col2
df['Col3'] = np.where(df.Col1.isin(df[df.Col2==34].Col1), df['Col2'], 0)
I realize that assigning the df['Col2']
else 0 from the where condition is most likely my logic issue, and that there is probably some easy concise way of doing this (or that my time might be better spent in SQL), but I'm not sure on how to set this up. Thanks in advance.
using query
+ map
df['Col3'] = df.ID.map(df.query('Col1 == 34').set_index('ID').Col2)
print(df)
ID Col1 Col2 Col3
0 1 50 12:23:01 12:25:11
1 1 34 12:25:11 12:25:11
2 1 65 12:32:25 12:25:11
3 1 98 12:45:08 12:25:11
4 2 23 11:09:10 11:14:26
5 2 12 11:12:43 11:14:26
6 2 56 11:13:12 11:14:26
7 2 34 11:14:26 11:14:26
8 2 77 11:16:02 11:14:26
9 3 64 14:01:11 14:01:13
10 3 34 14:01:13 14:01:13
11 3 48 14:02:32 14:01:13
dealing with duplicates
# keep first instance
df.ID.map(df.query('Col1 == 34') \
.drop_duplicates(subset=['ID']).set_index('ID').Col2)
Or
# keep last instance
df.ID.map(df.query('Col1 == 34') \
.drop_duplicates(subset=['ID'], keep='last').set_index('ID').Col2)
Take advantage of pandas automatic index alignment by making id
the index. Then just append a column based on boolean selection. This answer assumes col1 is unique.
df.set_index('id', inplace=True)
df['col3'] = df.loc[df.col1 == 34, 'col2']
这是一个基于NumPy的矢量化解决方案 -
df['Col3'] = df.Col2.values[df.Col1.values == 34][df.ID.factorize()[0]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.