简体   繁体   中英

rearrange rows values in csv using pandas python

I am working with pandas and have a csv file that looks like this

  ID                Name        Store      Price           
Melbourne           
    1               aaaa        bbbb        570
    2               cccc        dddd        236
    3               eeee        ffff        230
Sydney
    1               hhhh        gggg        2300
    2               kkkk        llll        266

I want the it in this shape

City            ID               Name        Store       Price      
Melbourne        1               aaaa        bbbb        570
Melbourne        2               cccc        dddd        236
Melbourne        3               eeee        ffff        230
Sydney           1               hhhh        gggg        23
Sydney           2               kkkk        llll        266

What I am thinking is
1. adding a new column

  ID        New               Name        Store      Price           
Melbourne   NaN  
    1       NaN               aaaa        bbbb        570
    2       NaN               cccc        dddd        236
    3       NaN               eeee        ffff        230
Sydney 
    1       NaN               hhhh        gggg        2300
    2       NaN               kkkk        llll        266
  1. then change the index to ID. So it would look like this

      ID New Name Store Price Melbourne NaN 1 NaN aaaa bbbb 570 2 NaN cccc dddd 236 3 NaN eeee ffff 230 Sydney NaN 1 NaN hhhh gggg 2300 2 NaN kkkk llll 266 
  2. and then something like this

      ID New Name Store Price Melbourne NaN Melbourne 1 aaaa bbbb 570 Melbourne 2 cccc dddd 236 Melbourne 3 eeee ffff 230 Sydney NaN Sydney 1 hhhh gggg 2300 Sydney 2 kkkk llll 266 
  3. Finally change the column name and delete the rows without values

     City ID Name Store Price Melbourne 1 aaaa bbbb 570 Melbourne 2 cccc dddd 236 Melbourne 3 eeee ffff 230 Sydney 1 hhhh gggg 2300 Sydney 2 kkkk llll 266 

    I am not sure if it can be implemented or not. Please give me some idea about how can I implement this.

There are several ways of doing this and below are some ideas on how to implement your proposed method.

Step 1:

Check if a string only contains alphabetic characters by using str.isalpha() method:

df["column"].apply(lambda x: x if x.isalpha() else None)

Above will return a Series with the numeric values in your ID column as None. You can store this in a new column.

This solution assumes that every x is a string.


If you don't know for sure that all the numbers in your column are strings you could create a list of cities:

my_cities = ['Melbourne', 'Sydney']

Then check if the value in your column is a city and store it in a boolean Series :

is_city = df['column'].isin(my_cities)

Apply the Series as a mask, and replace values with None :

df.loc[~is_city, 'column'] = None

(Note that ~ means "not" so this would return rows where value "is not a city").

Step 2:

Fill the new column using ffill method:

df["new_column"].fillna(method="ffill")

ffill basically does step 3 in your question. You can read about it here .

Step 3:

Finally drop all rows containing at least one None :

df.dropna()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM