[英]Extract text after period “.” from values in a column in Pandas Dataframes
I have a column in a dataframe as follows:我在数据框中有一列,如下所示:
| Category |
------------
| B5050.88
| 5051.90
| B5050.97Q
| 5051.23B
| 5051.78E
| B5050.11
| 5051.09
| Z5052
I want to extract the text after the period.我想在句号后提取文本。 For example, from B5050.88, I want only "88";例如,从 B5050.88 开始,我只想要“88”; from 5051.78E, I want only "78E";从 5051.78E 开始,我只想要“78E”; for Z50502, it would be nothing as there's no period.对于 Z50502,它没有任何意义,因为没有句号。
Expected output:预期输出:
| Category | Digits |
---------------------
| B5050.88 | 88 |
| 5051.90 | 90 |
| B5050.97Q| 97Q |
| 5051.23B | 23B |
| 5051.78E | 78E |
| B5050.11 | 11 |
| 5051.09 | 09 |
| Z5052 | - |
I tried using this我试过用这个
df['Digits'] = df.Category.str.extract('.(.*)')
But I'm not getting the right answer.但我没有得到正确的答案。 Using the above, for B5050.88, I'm getting the same B5050.88;使用以上,对于 B5050.88,我得到相同的 B5050.88; for 5051.09, I'm getting NaN.对于 5051.09,我得到了 NaN。 Basically NaN if there's no text.如果没有文字,基本上是 NaN。
You can do你可以做
splits = [str(p).split(".") for p in df["Category"]]
df["Digits"] = [p[1] if len(p)>1 else "-" for p in splits]
ie IE
df = pd.DataFrame({"Category":["5050.88","5051.90","B5050.97","5051.23B","5051.78E",
"B5050.11","5051.09","Z5052"]})
#df
# Category
# 0 5050.88
# 1 5051.90
# 2 B5050.97
# 3 5051.23B
# 4 5051.78E
# 5 B5050.11
# 6 5051.09
# 7 Z5052
splits = [str(p).split(".") for p in df["Category"]]
splits
# [['5050', '88'],
# ['5051', '90'],
# ['B5050', '97'],
# ['5051', '23B'],
# ['5051', '78E'],
# ['B5050', '11'],
# ['5051', '09'],
# ['Z5052']]
df["Digits"] = [p[1] if len(p)>1 else "-" for p in splits]
df
# Category Digits
# 0 5050.88 88
# 1 5051.90 90
# 2 B5050.97 97
# 3 5051.23B 23B
# 4 5051.78E 78E
# 5 B5050.11 11
# 6 5051.09 09
# 7 Z5052 -
not so pretty but it works不是很漂亮,但它有效
EDIT:编辑:
Added the "-" instead of NaN and the code snippet添加了“-”而不是 NaN 和代码片段
Another way其它的办法
df.Category.str.split('[\.]').str[1]
0 88
1 90
2 97Q
3 23B
4 78E
5 11
6 09
7 NaN
Alternatively或者
df.Category.str.extract('((?<=[.])(\w+))')
You need to escape your first .
你需要逃离你的第一个.
and do fillna
:并做fillna
:
df["Digits"] = df["Category"].astype(str).str.extract("\.(.*)").fillna("-")
print(df)
Output:输出:
Category Digits
0 B5050.88 88
1 5051.90 90
2 B5050.97Q 97Q
3 5051.23B 23B
4 5051.78E 78E
5 B5050.11 11
6 5051.09 09
7 Z5052 -
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.