简体   繁体   中英

Remove subdirectories from String in Dataframe Column in Python

I am trying to use l and rsplit to remove the subdirectories from this dataframe and preserve just the file name in the dataframe's column.

import pandas as pd
data = ['D:/xyz/abc/123/file_1.txt', 'D:/xyz/abc/file2.txt', 'D:/xyz/file_2.txt']
data = pd.DataFrame(data)
data[0].str.rsplit('/').str[3]

Returns:

Out[1]: 
0          123
1    file2.txt
2          NaN
Name: 0, dtype: object

As you can see, this does not preserve just the txt file names regardless of the str[] function.

Desired output:

Out[1]: 
0    file_1.txt
1    file2.txt
2    file_2.txt
Name: 0, dtype: object

Any insight would be appreciated. Thanks.

Try rsplit with limit 1 and pick last item

data[0].str.rsplit('/', n=1).str[-1]

Out[194]:
0    file_1.txt
1     file2.txt
2    file_2.txt
Name: 0, dtype: object

Can use os.path.split to get the last section of the path

https://docs.python.org/3.3/library/os.path.html?highlight=path#os.path.split

import os

f = lambda x: os.path.split(x)[1]
data[0] = data[0].map(f)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM