简体   繁体   中英

How can I edit lines in pandas dataframe?

This my pandas dataframe:

C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Block.java                 
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\BlockFactory.java          
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Map.java                    
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Player.java                 
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\PlayerAlgorithm.java        
C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\PlayerConstants.java        

I need to extract the string after the sixth '\\' delimiter and replace each '\\' in the rest by .

Output exemple for the first line:
blokusgame.mi.android.hazi.blokus.GameLogic.Block.java      

If I use split it will be complicated!

One solution using str.extract and replace :

df = pd.DataFrame({'x':[r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Block.java',
                        r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\BlockFactory.java',
                        r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Map.java',
                        r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Player.java',
                        r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\PlayerAlgorithm.java',
                        r'C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\PlayerConstants.java']})

df['y'] = df['x'].str.extract(r'^.*\\java\\(.*)$')
df['y'].replace(r'\\', r'.', regex=True, inplace=True)

yields

blokusgame.mi.android.hazi.blokus.GameLogic.Block.java
blokusgame.mi.android.hazi.blokus.GameLogic.BlockFactory.java
blokusgame.mi.android.hazi.blokus.GameLogic.Map.java
blokusgame.mi.android.hazi.blokus.GameLogic.Player.java
blokusgame.mi.android.hazi.blokus.GameLogic.PlayerAlgorithm.java
blokusgame.mi.android.hazi.blokus.GameLogic.PlayerConstants.java

Rather than look for the Nth slash, you can use regular expressions to find the key 'break' in your text string (in this case, \\java\\ ) and extract everything after that. Then you can just replace \\ with . .

Maybe like this:

import re

s = r"C:\BlokusDuo-master\app\src\main\java\blokusgame\mi\android\hazi\blokus\GameLogic\Block.java"

def replace_sixth(s):
    iterator = re.finditer("\\\\",s)
    location = [ next(iterator) for _ in range(6) ][-1]
    start = location.start()+1
    return s[start:].replace("\\",".")

And then you apply to your dataframe: df.apply(replace_sixth)

Two things about pandas you need to know to do this.

1st: str operations on a dataframe column (or pandas series) allow for anything you can do with a string eg df.columnname.str.replace() or df.columnname.str.capitalize() etc..

2nd is indexing: When you split you will have a list inside and you want the 6th item over you use

str[<index_here>] 

or

str[<start>:<end>]

If you know these two things you can do it in one short line.

df['fixed_filenames'] = df.files_column.str.split("\\").str[6:].str.join('.')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM