[英]Pandas: Splitting a non-numeric identifier code into multiple rows
Suppose I have a data set that looks like this假设我有一个看起来像这样的数据集
Unique_Identifier Score1 Score2
112 50 60
113-114 50 70
115 40 20
116-117 30 90
118 70 70
Notice how some of my unique identifiers are listed as ranges, rather than exact values.请注意我的一些唯一标识符是如何列为范围而不是精确值的。 I want to split up those ranges to each be 2 separate rows with the same scores so that it would look like this:
我想将这些范围拆分为 2 个具有相同分数的单独行,以便它看起来像这样:
Unique_Identifier Score1 Score2
112 50 60
113 50 70
114 50 70
115 40 20
116 30 90
117 30 90
118 70 70
How would I go about doing this in Python using Pandas?我 go 如何使用 Pandas 在 Python 中执行此操作? I think there may be a way to test for rows that have a "-" in them, but I'm not sure how I would go about splitting those rows.
我认为可能有一种方法可以测试其中包含“-”的行,但我不确定 go 如何拆分这些行。 I should also note that some identifier ranges have more than just 2 identifiers in them, such as 120-124.
我还应该注意,某些标识符范围中的标识符不止 2 个,例如 120-124。
df.assign(Unique_Identifier=df.Unique_Identifier.str.split('-')).explode('Unique_Identifier')
Unique_Identifier Score1 Score2
0 112 50 60
1 113 50 70
1 114 50 70
2 115 40 20
3 116 30 90
3 117 30 90
4 118 70 70
split
on "-" and create a list with the desired range
. split
为“-”并创建具有所需range
的列表。 Then explode
to individual rows:然后
explode
成单独的行:
df["Unique_Identifier"] = df["Unique_Identifier"].apply(lambda x: list(range(int(x.split("-")[0]),int(x.split("-")[1])+1)) if "-" in x else [int(x)])
df = df.explode("Unique_Identifier")
>>> df
Unique_Identifier Score1 Score2
0 112 50 60
1 113 50 70
1 114 50 70
2 115 40 20
3 116 30 90
3 117 30 90
4 118 70 70
5 120 80 80
5 121 80 80
5 122 80 80
5 123 80 80
5 124 80 80
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.