系列字符串替換為另一個系列的內容（不使用 apply）

Question

為了優化，我想知道是否可以在不使用apply的情況下，用另一列的相應行的內容對一列進行更快的字符串替換。

這是我的數據框：

data_dict = {'root': [r'c:/windows/'], 'file': [r'c:/windows/system32/calc.exe']}
df = pd.DataFrame.from_dict(data_dict)

"""
Result:
                           file         root
0  c:/windows/system32/calc.exe  c:/windows/
"""

使用以下應用程序，我可以獲得我想要的：

df['trunc'] = df.apply(lambda x: x['file'].replace(x['path'], ''), axis=1)

"""
Result:
                           file         root              trunc
0  c:/windows/system32/calc.exe  c:/windows/  system32/calc.exe 
"""

但是，為了更有效地使用代碼，我想知道是否有更好的方法。 我已經嘗試了下面的代碼，但它似乎並沒有像我預期的那樣工作。

df['trunc'] = df['file'].replace(df['root'], '')

"""
Result (note that the root was NOT properly replaced with a black string in the 'trunc' column):

                           file         root                         trunc
0  c:/windows/system32/calc.exe  c:/windows/  c:/windows/system32/calc.exe
"""

有沒有更有效的替代方案？ 謝謝！

編輯 - 來自下面幾個例子的時間

# Expand out the data set to 1000 entries
data_dict = {'root': [r'c:/windows/']*1000, 'file': [r'c:/windows/system32/calc.exe']*1000}
df0 = pd.DataFrame.from_dict(data_dict)

使用應用

%%timeit -n 100
df0['trunk0'] = df0.apply(lambda x: x['file'].replace(x['root'], ''), axis=1)

100 個循環，最好的 3 個：每個循環 13.9 毫秒

使用替換（感謝 Gayatri）

%%timeit -n 100
df0['trunk1'] = df0['file'].replace(df0['root'], '', regex=True)

100 個循環，最好的 3 個：每個循環 365 毫秒

使用 Zip（感謝 0p3n5ourcE）

%%timeit -n 100
df0['trunk2'] = [file_val.replace(root_val, '') for file_val, root_val in zip(df0.file, df0.root)]

100 個循環，最好的 3 個：每個循環 600 µs

總的來說，看起來 zip 是這里的最佳選擇。 感謝所有的投入！

Answer 1

嘗試這個：

df['file'] = df['file'].astype(str)
df['root'] = df['root'].astype(str)
df['file'].replace(df['root'],'', regex=True)

輸出：

0    system32/calc.exe
Name: file, dtype: object

Answer 2

使用與鏈接類似的方法

df['trunc'] = [file_val.replace(root_val, '') for file_val, root_val in zip(df.file, df.root)]

輸出：

                          file         root              trunc
0  c:/windows/system32/calc.exe  c:/windows/  system32/calc.exe

檢查timeit ：

%%timeit
df['trunc'] = df.apply(lambda x: x['file'].replace(x['root'], ''), axis=1)

結果：

1000 loops, best of 3: 469 µs per loop

使用郵編：

%%timeit
df['trunc'] = [file_val.replace(root_val, '') for file_val, root_val in zip(df.file, df.root)]

結果：

1000 loops, best of 3: 322 µs per loop

系列字符串替換為另一個系列的內容（不使用 apply）

問題描述

2 個解決方案

解決方案1
1 2017-09-27 19:27:17

解決方案2
1 已采納 2017-09-27 20:19:23

系列字符串替換為另一個系列的內容（不使用 apply）

問題描述

2 個解決方案

解決方案1 1 2017-09-27 19:27:17

解決方案2 1 已采納 2017-09-27 20:19:23

解決方案1
1 2017-09-27 19:27:17

解決方案2
1 已采納 2017-09-27 20:19:23