Pandas，删除最后一个'_'之后的所有内容

Question

I have the following kind of strings in my column seen below.我的专栏中有以下类型的字符串，如下所示。 I would like to parse out everything after the last _ of each string, and if there is no _ then leave the string as-is.我想解析每个字符串的最后一个_之后的所有内容，如果没有_则保留字符串原样。 (as my below try will just exclude strings with no _ ) （因为我下面的尝试只会排除没有_的字符串）

so far I have tried below, seen here: Python pandas: remove everything after a delimiter in a string .到目前为止，我已经在下面尝试过，在这里看到： Python pandas: remove all after a delimiter in a string 。 But it is just parsing out everything after first _但它只是在第一个_之后解析所有内容

d6['SOURCE_NAME'] = d6['SOURCE_NAME'].str.split('_').str[0]

Here are some example strings in my SOURCE_NAME column.以下是我的 SOURCE_NAME 列中的一些示例字符串。

Stackoverflow_1234
Stack_Over_Flow_1234
Stackoverflow
Stack_Overflow_1234

Expected:预期的：

Stackoverflow
Stack_Over_Flow
Stackoverflow
Stack_Overflow

any help would be appreciated.任何帮助，将不胜感激。

Answer 1

Use a combination of str.rsplit and str.get for your desired outcome.使用str.rsplit和str.get的组合以获得您想要的结果。 str.rsplit simply splits a string from the end, while str.get gets the nth element of an iterator within a pd.Series object. str.rsplit只是从末尾拆分字符串，而str.get获取 pd.Series object 中迭代器的第 n 个元素。

Answer回答

d6['SOURCE_NAME'] = df['SOURCE_NAME'].str.rsplit('_', n=1).str.get(0)

the n argument in rsplit limits number of splits in output so that you only keep everything before the last '_'. rsplit 中的n参数限制了rsplit中的拆分次数，因此您只保留最后一个“_”之前的所有内容。

Even though a solution using pd.Series.apply is almost half as fast, I like this one because is more expressive in it's syntax.尽管使用pd.Series.apply的解决方案几乎快了一半，但我喜欢这个解决方案，因为它的语法更具表现力。 If you want to use the pd.Series.apply solution (faster) check the timing part!如果您想使用pd.Series.apply解决方案（更快），请检查计时部分！

pandas documentation . pandas 文档。

Example例子

strs = ['Stackoverflow_1234',
        'Stack_Over_Flow_1234',
        'Stackoverflow',
        'Stack_Overflow_1234']
df = pd.DataFrame(data={'SOURCE_NAME': strs})

This results in这导致

print(df)
            SOURCE_NAME
0    Stackoverflow_1234
1  Stack_Over_Flow_1234
2         Stackoverflow
3   Stack_Overflow_1234

Using the proposed solution:使用建议的解决方案：

df['SOURCE_NAME'].str.rsplit('_', 1).str.get(0)

0      Stackoverflow
1    Stack_Over_Flow
2      Stackoverflow
3     Stack_Overflow
Name: SOURCE_NAME, dtype: object

Time时间

Interestingly, using pd.Series.str is not necessarily faster than using pd.Series.apply :有趣的是，使用pd.Series.str不一定比使用pd.Series.apply快：

import pandas as pd

df = pd.DataFrame(data={'SOURCE_NAME': ['stackoverflow_1234_abcd'] * 1000})

%timeit df['SOURCE_NAME'].apply(lambda x: x.rsplit('_', 1)[0])
497 µs ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['SOURCE_NAME'].str.rsplit('_', n=1).str.get(0)
1.04 ms ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# increasing the number of rows x 100
df = pd.concat([df] * 100)

%timeit df['SOURCE_NAME'].apply(lambda x: x.rsplit('_', 1)[0])
31.7 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['SOURCE_NAME'].str.rsplit('_', n=1).str.get(0)
84.1 ms ± 6.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Answer 2

you could try applying lambda as such:您可以尝试这样应用 lambda ：

d6['SOURCE_NAME'] = df['SOURCE_NAME'].apply(lambda x: x.split('_')[0])

Hope that helps!希望有帮助！

Answer 3

Using rsplit() returns what you want to achieve, you can tell it how many times to split your string.使用 rsplit() 返回您想要实现的目标，您可以告诉它拆分字符串的次数。

s = "Stack_Over_Flow_1234"
s.rsplit('_', 1)[0] # Split my string one time and get the first part of it

This then returns 'Stack_Over_Flow'然后返回'Stack_Over_Flow'

Answer 4

You can use the string.split('_') function to split the string into a list of substrings around every underscore, then recombine them without the last element.您可以使用 string.split('_') function 将字符串拆分为围绕每个下划线的子字符串列表，然后在没有最后一个元素的情况下重新组合它们。 Here is a snippet using your examples:这是使用您的示例的代码段：

a = ["Stackoverflow_1234", "Stack_Over_Flow_1234", "Stackoverflow", "Stack_Overflow_1234"]

for e in a:

    # Split the string into a list, separated at '_'
    splitStr = e.split("_")

    # If there is only 1 element, we can use it directly
    if len(splitStr) == 1:
        print(splitStr[0])

    # Slice off the final substring and join the remaining 
    # substrings back together with underscores
    else:
        print("_".join(splitStr[:-1]))

Pandas，删除最后一个'_'之后的所有内容

问题描述

4 个解决方案

解决方案1
5 已采纳 2019-11-06 21:35:07

Answer回答

Example例子

Time时间

解决方案2
1 2019-11-06 21:30:25

解决方案3
1 2019-11-06 21:32:35

解决方案4
1 2019-11-06 21:35:37

Pandas，删除最后一个'_'之后的所有内容

问题描述

4 个解决方案

解决方案1 5 已采纳 2019-11-06 21:35:07

Answer回答

Example例子

Time时间

解决方案2 1 2019-11-06 21:30:25

解决方案3 1 2019-11-06 21:32:35

解决方案4 1 2019-11-06 21:35:37

解决方案1
5 已采纳 2019-11-06 21:35:07

解决方案2
1 2019-11-06 21:30:25

解决方案3
1 2019-11-06 21:32:35

解决方案4
1 2019-11-06 21:35:37