提取句号“.”后的文本来自 Pandas Dataframes 列中的值

Question

I have a column in a dataframe as follows:我在数据框中有一列，如下所示：

| Category |
------------
| B5050.88
| 5051.90
| B5050.97Q
| 5051.23B
| 5051.78E
| B5050.11
| 5051.09
| Z5052

I want to extract the text after the period.我想在句号后提取文本。 For example, from B5050.88, I want only "88";例如，从 B5050.88 开始，我只想要“88”； from 5051.78E, I want only "78E";从 5051.78E 开始，我只想要“78E”； for Z50502, it would be nothing as there's no period.对于 Z50502，它没有任何意义，因为没有句号。

Expected output:预期输出：

| Category | Digits |
---------------------
| B5050.88 | 88  |
| 5051.90  | 90  |
| B5050.97Q| 97Q |
| 5051.23B | 23B |
| 5051.78E | 78E |
| B5050.11 | 11  |
| 5051.09  | 09  |
| Z5052    | -   |

I tried using this我试过用这个

df['Digits'] = df.Category.str.extract('.(.*)')

But I'm not getting the right answer.但我没有得到正确的答案。 Using the above, for B5050.88, I'm getting the same B5050.88;使用以上，对于 B5050.88，我得到相同的 B5050.88； for 5051.09, I'm getting NaN.对于 5051.09，我得到了 NaN。 Basically NaN if there's no text.如果没有文字，基本上是 NaN。

Answer 1

You can do你可以做

splits = [str(p).split(".") for p in df["Category"]]
df["Digits"] = [p[1] if len(p)>1 else "-" for p in splits]

ie IE


df = pd.DataFrame({"Category":["5050.88","5051.90","B5050.97","5051.23B","5051.78E",
"B5050.11","5051.09","Z5052"]})

#df

#   Category
# 0 5050.88
# 1 5051.90
# 2 B5050.97
# 3 5051.23B
# 4 5051.78E
# 5 B5050.11
# 6 5051.09
# 7 Z5052

splits = [str(p).split(".") for p in df["Category"]]
splits

# [['5050', '88'],
 # ['5051', '90'],
 # ['B5050', '97'],
 # ['5051', '23B'],
 # ['5051', '78E'],
 # ['B5050', '11'],
 # ['5051', '09'],
 # ['Z5052']]

df["Digits"] = [p[1] if len(p)>1 else "-" for p in splits]
df

# Category  Digits
# 0 5050.88     88
# 1 5051.90     90
# 2 B5050.97    97
# 3 5051.23B    23B
# 4 5051.78E    78E
# 5 B5050.11    11
# 6 5051.09     09
# 7 Z5052        -

not so pretty but it works不是很漂亮，但它有效

EDIT:编辑：

Added the "-" instead of NaN and the code snippet添加了“-”而不是 NaN 和代码片段

Answer 2

Another way其它的办法

df.Category.str.split('[\.]').str[1]

0     88
1     90
2    97Q
3    23B
4    78E
5     11
6     09
7    NaN

Alternatively或者

df.Category.str.extract('((?<=[.])(\w+))')

Answer 3

You need to escape your first .你需要逃离你的第一个. and do fillna :并做fillna ：

df["Digits"] = df["Category"].astype(str).str.extract("\.(.*)").fillna("-")
print(df)

Output:输出：

    Category Digits
0   B5050.88     88
1    5051.90     90
2  B5050.97Q    97Q
3   5051.23B    23B
4   5051.78E    78E
5   B5050.11     11
6    5051.09     09
7      Z5052      -

Answer 4

try out below :试试下面：

df['Category'].apply(lambda x : x.split(".")[-1] if "." in list(x) else "-")

check below code检查下面的代码

提取句号“.”后的文本来自 Pandas Dataframes 列中的值

问题描述

4 个解决方案

解决方案1
1 2020-09-14 08:08:10

解决方案2
0 2020-09-14 08:02:06

解决方案3
0 2020-09-14 08:18:47

解决方案4
0 2020-09-15 07:03:47

提取句号“.”后的文本来自 Pandas Dataframes 列中的值

问题描述

4 个解决方案

解决方案1 1 2020-09-14 08:08:10

解决方案2 0 2020-09-14 08:02:06

解决方案3 0 2020-09-14 08:18:47

解决方案4 0 2020-09-15 07:03:47

解决方案1
1 2020-09-14 08:08:10

解决方案2
0 2020-09-14 08:02:06

解决方案3
0 2020-09-14 08:18:47

解决方案4
0 2020-09-15 07:03:47