简体   繁体   中英

Splitting single text column into multiple columns Pandas

I am working on extraction of raw data from various sources. After a process, I could form a dataframe that looked like this.

                                                              data
0               ₹ 16,50,000\n2014 - 49,000 km\nJaguar XF 2.2\nJAN 16
1               ₹ 23,60,000\n2017 - 28,000 km\nMercedes-Benz CLA 200 CDI Style, 2017, Diesel\nNOV 26 
2               ₹ 26,00,000\n2016 - 44,000 km\nMercedes Benz C-Class Progressive C 220d, 2016, Diesel\nJAN 03

I want to split this raw dataframe into relevant columns in order of the raw data occurence: Price, Year, Mileage, Name, Date

I have tried to use df.data.split('-', expand=True) with other delimiter options sequentially along with some lambda functions to achieve this, but haven't gotten much success.

Need assistance in splitting this data into relevant columns.

Expected output:

    price       year    milege            name           date
    16,50,000   2014    49000   Jaguar 2.2 XF Luxury    Jan-17
    23,60,000   2017    28000   CLA CDI Style           Nov-26    
    26,00,000   2016    44000   Mercedes C-Class C220d  Jan-03

Try split on '\n' then on '-'

df[["Price","Year-Mileage","Name","Date"]] =df.data.str.split('\n', expand=True)
df[["Year","Mileage"]] =df ["Year-Mileage"].str.split('-', expand=True)
df.drop(columns=["data","Year-Mileage"],inplace=True)
print(df)

    Price         Name          Date      Year  Mileage
0   ₹ 16,50,000 Jaguar XF 2.2   JAN 16  2014    49,000 km
2   ₹ 26,00,000 Mercedes Benz C-Class Progressive C 220d, 2016, Diesel  JAN 03  2016    44,000 km
1   ₹ 23,60,000 Mercedes-Benz CLA 200 CDI Style, 2017, Diesel   NOV 26  2017    28,000 km

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM