[英]How to create rows for unique values in columns in pandas?
I have a pandas dataframe with thousands of rows like so:我有一个包含数千行的 Pandas 数据框,如下所示:
IntentID IntentName Query Response
1 Intent Name 1 Query 1 Response1
2 Intent Name 1 Query 1 Response2
3 Intent Name 2 Query 2 Response3
4 Intent Name 2 Query 2 Response4
5 Intent Name 3 Query 3 Response5
I need all unique values in "IntentName" to have the same IntentID value like so:我需要“IntentName”中的所有唯一值都具有相同的 IntentID 值,如下所示:
IntentID IntentName Query Response
1 Intent Name 1 Query 1 Response1
1 Intent Name 1 Query 1 Response2
2 Intent Name 2 Query 2 Response3
2 Intent Name 2 Query 2 Response4
3 Intent Name 3 Query 3 Response5
What is the easiest way to do this?什么是最简单的方法来做到这一点?
Try this:尝试这个:
df['IntentID'] = df.groupby('IntentName') \
['IntentID'].transform('first') \
.rank(method='dense') \
.astype('int')
How it works:这个怎么运作:
IntentName
IntentName
对行进行IntentName
IntentID
IntentID
IntentID
s 1, 1, 2, 2, 3, etc. ( method=dense
)IntentID
s 1, 1, 2, 2, 3 等进行排名( method=dense
)You can use regex:您可以使用正则表达式:
df['IntentID'] = df.IntentName.apply(lambda x: re.search('(?P<num>\d+)',x).groups()[0])
output输出
IntentID IntentName Query Response
0 1 Intent Name 1 Query 1 Response1
1 1 Intent Name 1 Query 1 Response2
2 2 Intent Name 2 Query 2 Response3
3 2 Intent Name 2 Query 2 Response4
4 3 Intent Name 3 Query 3 Response5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.