[英]pandas: replace one cell's value from mutiple row by one particular row based on other columns
my aim:我的目标:
uniqueIdentity beginTime progrNumber
0 2018-02-07-6253554 17:40:29 1
1 2018-02-07-6253554 17:40:29 2
2 2018-02-07-6253554 17:40:29 3
3 2018-02-07-6253554 17:40:29 4
4 2018-02-07-6253554 17:40:29 5
5 2018-02-07-5555333 17:48:29 2
6 2018-02-07-5555333 17:48:29 3
7 2018-02-07-5555333 17:48:29 4
8 2018-02-07-2345622 18:40:29 1
9 2018-02-07-2345622 18:40:29 2
10 2018-02-07-2345622 18:40:29 3
11 2018-02-07-2345622 18:40:29 4
my dataset now:我现在的数据集:
uniqueIdentity beginTime progrNumber
0 2018-02-07-6253554 17:40:29 1
1 2018-02-07-6253554 17:41:15 2
2 2018-02-07-6253554 17:41:55 3
3 2018-02-07-6253554 17:42:54 4
4 2018-02-07-6253554 17:43:29 5
5 2018-02-07-5555333 17:49:15 2
6 2018-02-07-5555333 17:49:55 3
7 2018-02-07-5555333 17:50:54 4
8 2018-02-07-2345622 18:40:29 1
9 2018-02-07-2345622 18:41:15 2
10 2018-02-07-2345622 18:41:55 3
11 2018-02-07-2345622 18:42:54 4
That means: for rows having same 'uniqueIdentity', the 'beginTime' should be replaced by the value of cell which having the same'uniqueIdentity' and 'progrNumber' is the min 'progrNumber'.这意味着:对于具有相同“uniqueIdentity”的行,“beginTime”应替换为具有相同“uniqueIdentity”的单元格的值,而“progrNumber”是最小的“progrNumber”。
As you mention in the comments, the lowest progrNumber
will also be the lowest beginTime
.正如您在评论中提到的,最低的progrNumber
也将是最低的beginTime
。 This means you can just take the lowest beginTime
per uniqueIdentity
using groupby
and transform
.这意味着您可以使用groupby
和transform
获取每个uniqueIdentity
的最低beginTime
。
Note if beginTime
is of type string, this will only work if it has consistent formatting.请注意,如果beginTime
是字符串类型,则仅当它具有一致的格式时才有效。 (eg '09:40:20' instead of '9:40:20') (例如“09:40:20”而不是“9:40:20”)
df['beginTime'] = df.groupby('uniqueIdentity').beginTime.transform('min')
uniqueIdentity beginTime progrNumber
0 2018-02-07-6253554 17:40:29 1
1 2018-02-07-6253554 17:40:29 2
2 2018-02-07-5555333 17:48:29 3
3 2018-02-07-5555333 17:48:29 4
4 2018-02-07-6253554 17:40:29 3
5 2018-02-07-6253554 17:40:29 4
6 2018-02-07-5555333 17:48:29 1
7 2018-02-07-5555333 17:48:29 2
8 2018-02-07-2345622 18:40:29 1
9 2018-02-07-2345622 18:40:29 3
10 2018-02-07-2345622 18:40:29 4
Here's another option using a left join and some renaming这是使用左连接和一些重命名的另一个选项
# find rows where progrNumber is 1
df_prog1=df[df.progrNumber==1]
# do a left join on the original
df=df.merge(df_prog1,on='uniqueIdentity',how='left',suffixes=('','_y'))
# keep only the beginTime from the right frame
df=df[['uniqueIdentity','beginTime_y','progrNumber']]
# rename columns
df=df.rename(columns={'beginTime_y':'beginTime'})
print(df)
Results in:结果是:
uniqueIdentity beginTime progrNumber
0 2018-02-07-6253554 17:40:29 1
1 2018-02-07-6253554 17:40:29 2
2 2018-02-07-6253554 17:40:29 3
3 2018-02-07-6253554 17:40:29 4
4 2018-02-07-5555333 17:48:29 1
5 2018-02-07-5555333 17:48:29 2
6 2018-02-07-5555333 17:48:29 3
7 2018-02-07-5555333 17:48:29 4
8 2018-02-07-2345622 18:40:29 1
9 2018-02-07-2345622 18:40:29 2
10 2018-02-07-2345622 18:40:29 3
11 2018-02-07-2345622 18:40:29 4
if you're not sure which record within a uniqueIdentity
will have the minimum time, you can use a groupby
instead of selecting where progrNumber==1
:如果您不确定uniqueIdentity
中的哪条记录的时间最短,您可以使用groupby
而不是选择 where progrNumber==1
:
df_prog1=df.groupby('uniqueIdentity')['beginTime'].min().reset_index()
And do the left join as above.并按照上面的方法进行左连接。
If the first beginTime
for each user will always correspond to the minimum program number for each user, you can do:如果每个用户的第一个beginTime
始终对应于每个用户的最小程序编号,您可以执行以下操作:
d = df.groupby('uniqueIdentity')['beginTime'].first().to_dict()
df['beginTime'] = df['uniqueIdentity'].map(d)
To be more explicit about getting the time where the program number is minimum (regardless of its position), you replace d
in the above with:为了更明确地获取程序编号最小的时间(无论其位置如何),您将上面的d
替换为:
d = df.groupby('uniqueIdentity').apply(lambda x: x['beginTime'][x['progrNumber'].idxmin()]).to_dict()
These two yield the same result for your example data, but they will differ if there are users where the first beginTime
(or minimum beginTime
per Hugolmn) does not correspond to the minimum progrNumber
for the user这两个对您的示例数据产生相同的结果,但如果有用户的第一个beginTime
(或每个 Hugolmn 的最小beginTime
)不对应于用户的最小progrNumber
,它们会有所不同
groupby
and map
使用groupby
和map
The hypothesis is that beginTime
will always be minimal for a minimal progrNumber
.假设是对于最小的progrNumber
来说beginTime
总是最小的。 This condition is true based on the question's comments.根据问题的评论,此条件为真。
In this answer, I collect the minimum beginTime of each uniqueIdentity
and then map it to the original DataFrame based on uniqueIdentity
.在这个答案中,我收集了每个uniqueIdentity
的最小 beginTime ,然后将 map 收集到基于 uniqueIdentity 的原始uniqueIdentity
。
times = df.groupby('uniqueIdentity').beginTime.min()
df['beginTime'] = df.uniqueIdentity.map(times)
If we cannot assume that the min progrNumber
is also the min beginTime
, a more sophisiticated approach is required:如果我们不能假设 min progrNumber
也是 min beginTime
,则需要更复杂的方法:
df['beginTime'] = (
df.groupby('uniqueIdentity', as_index=False, group_keys=False)
.apply(lambda s: pd.Series(s[s.progrNumber==s.progrNumber.min()]
.beginTime.item(), index=s.index)
)
)
df
# uniqueIdentity beginTime progrNumber
# 0 2018-02-07-6253554 17:40:29 1
# 1 2018-02-07-6253554 17:40:29 2
# 2 2018-02-07-6253554 17:40:29 3
# 3 2018-02-07-6253554 17:40:29 4
# 4 2018-02-07-6253554 17:40:29 5
# 5 2018-02-07-5555333 17:49:15 2
# 6 2018-02-07-5555333 17:49:15 3
# 7 2018-02-07-5555333 17:49:15 4
# 8 2018-02-07-2345622 18:40:29 1
# 9 2018-02-07-2345622 18:40:29 2
# 10 2018-02-07-2345622 18:40:29 3
# 11 2018-02-07-2345622 18:40:29 4
If you don't want a oneliner, an approach with map
would be ideal如果您不想要单线器,则使用map
的方法将是理想的
mapping = (
df.groupby('uniqueIdentity')
.apply(lambda s: s[s.progrNumber==s.progrNumber.min()].beginTime.iloc[0])
)
df['beingTime'] = df.uniqueIdentity.map(mapping)
note: You can replace the iloc[0]
by item()
if you guarantee that only one value has the min progrNumber
注意:如果您保证只有一个值具有最小progrNumber
,则可以将iloc[0]
替换为item()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.