This my pandas dataframe:
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Block.java
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\BlockFactory.java
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Map.java
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Player.java
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\PlayerAlgorithm.java
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\PlayerConstants.java
I need to extract the string after the sixth '\\' delimiter and replace each '\\' in the rest by .
Output exemple for the first line:
blokusgame.mi.android.hazi.blokus.GameLogic.Block.java
If I use split it will be complicated!
One solution using str.extract
and replace
:
df = pd.DataFrame({'x':[r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Block.java',
r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\BlockFactory.java',
r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Map.java',
r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Player.java',
r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\PlayerAlgorithm.java',
r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\PlayerConstants.java']})
df['y'] = df['x'].str.extract(r'^.*\\java\\(.*)$')
df['y'].replace(r'\\', r'.', regex=True, inplace=True)
yields
blokusgame.mi.android.hazi.blokus.GameLogic.Block.java
blokusgame.mi.android.hazi.blokus.GameLogic.BlockFactory.java
blokusgame.mi.android.hazi.blokus.GameLogic.Map.java
blokusgame.mi.android.hazi.blokus.GameLogic.Player.java
blokusgame.mi.android.hazi.blokus.GameLogic.PlayerAlgorithm.java
blokusgame.mi.android.hazi.blokus.GameLogic.PlayerConstants.java
Rather than look for the Nth slash, you can use regular expressions to find the key 'break' in your text string (in this case, \\java\\
) and extract everything after that. Then you can just replace \\
with .
.
Maybe like this:
import re
s = r"C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Block.java"
def replace_sixth(s):
iterator = re.finditer("\\\\",s)
location = [ next(iterator) for _ in range(6) ][-1]
start = location.start()+1
return s[start:].replace("\\",".")
And then you apply to your dataframe: df.apply(replace_sixth)
Two things about pandas you need to know to do this.
1st: str operations on a dataframe column (or pandas series) allow for anything you can do with a string eg df.columnname.str.replace()
or df.columnname.str.capitalize()
etc..
2nd is indexing: When you split you will have a list inside and you want the 6th item over you use
str[<index_here>]
or
str[<start>:<end>]
If you know these two things you can do it in one short line.
df['fixed_filenames'] = df.files_column.str.split("\\").str[6:].str.join('.')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.