I am working with pandas and have a csv file that looks like this
ID Name Store Price
Melbourne
1 aaaa bbbb 570
2 cccc dddd 236
3 eeee ffff 230
Sydney
1 hhhh gggg 2300
2 kkkk llll 266
I want the it in this shape
City ID Name Store Price
Melbourne 1 aaaa bbbb 570
Melbourne 2 cccc dddd 236
Melbourne 3 eeee ffff 230
Sydney 1 hhhh gggg 23
Sydney 2 kkkk llll 266
What I am thinking is
1. adding a new column
ID New Name Store Price
Melbourne NaN
1 NaN aaaa bbbb 570
2 NaN cccc dddd 236
3 NaN eeee ffff 230
Sydney
1 NaN hhhh gggg 2300
2 NaN kkkk llll 266
then change the index to ID. So it would look like this
ID New Name Store Price Melbourne NaN 1 NaN aaaa bbbb 570 2 NaN cccc dddd 236 3 NaN eeee ffff 230 Sydney NaN 1 NaN hhhh gggg 2300 2 NaN kkkk llll 266
and then something like this
ID New Name Store Price Melbourne NaN Melbourne 1 aaaa bbbb 570 Melbourne 2 cccc dddd 236 Melbourne 3 eeee ffff 230 Sydney NaN Sydney 1 hhhh gggg 2300 Sydney 2 kkkk llll 266
Finally change the column name and delete the rows without values
City ID Name Store Price Melbourne 1 aaaa bbbb 570 Melbourne 2 cccc dddd 236 Melbourne 3 eeee ffff 230 Sydney 1 hhhh gggg 2300 Sydney 2 kkkk llll 266
I am not sure if it can be implemented or not. Please give me some idea about how can I implement this.
There are several ways of doing this and below are some ideas on how to implement your proposed method.
Step 1:
Check if a string only contains alphabetic characters by using str.isalpha()
method:
df["column"].apply(lambda x: x if x.isalpha() else None)
Above will return a Series
with the numeric values in your ID
column as None. You can store this in a new column.
This solution assumes that every x
is a string.
If you don't know for sure that all the numbers in your column are strings you could create a list of cities:
my_cities = ['Melbourne', 'Sydney']
Then check if the value in your column is a city and store it in a boolean Series
:
is_city = df['column'].isin(my_cities)
Apply the Series
as a mask, and replace values with None
:
df.loc[~is_city, 'column'] = None
(Note that ~
means "not" so this would return rows where value "is not a city").
Step 2:
Fill the new column using ffill
method:
df["new_column"].fillna(method="ffill")
ffill
basically does step 3 in your question. You can read about it here .
Step 3:
Finally drop all rows containing at least one None
:
df.dropna()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.