I have a dataframe, df, with one column.
data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']]}
df = pd.DataFrame(data,columns= ['details'])
df
I want to split the dataframe into different columns and get a dataframe that looks like this -
data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']],
'brand': ['honda', 'toyota', 'honda', 'toyota'],
'car': ['city','innova','','corolla'],
'colour': ['black','','red','white'],
'type': ['','','','sedan']
}
df2 = pd.DataFrame(data,columns= ['details', 'brand', 'car', 'colour', 'type'])
df2
I tried the following but it did not work -
a2 = []
b2 = []
c2 = []
d2 = []
for i in df['details']:
for j in range(len(i)):
if 'brand :' in i[j]:
print 'lalala'
a1 = i[j]
a2.append(a1)
else:
a1 = ''
a2.append(a1)
if 'car :' in i[j]:
print 'lalala'
b1 = i[j]
b2.append(b1)
else:
b1 = ''
b2.append(b1)
if 'colour :' in i[j]:
c1 = i[j]
c2.append(c1)
else:
c1 = ''
c2.append(c1)
if 'type :' in i[j]:
d1 = i[j]
d2.append(d1)
else:
d1 = ''
d2.append(d1)
df['brand'] = a2
df['car'] = b2
df['colour'] = c2
df['type'] = d2
Please help as I have hit a major roadbloack.
You might try the following assuming the details types are known:
details_types = ['brand', 'car', 'colour', 'type']
for x in details_types :
df[x] = None
for idx, value in df.iterrows():
for col_details in df.iloc[idx, 0]:
feature = col_details.replace(' ', '').split(':')[0]
value = col_details.replace(' ', '').split(':')[1]
df.iloc[idx, list(df.columns).index(feature)] = value
Output
| | details | brand | car | colour | type |
|---|---------------------------------------------------|--------|---------|--------|-------|
| 0 | [brand : honda, car : city, colour : black] | honda | city | black | None |
| 1 | [brand : toyota, car : innova] | toyota | innova | None | None |
| 2 | [brand : honda, colour : red] | honda | None | red | None |
| 3 | [brand : toyota, car : corolla, colour : white... | toyota | corolla | white | sedan |
A slightly simpler approach might be the following -
data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']]}
#takes a string and returns a dict based on ':'
def fix(l):
return dict(s.split(':') for s in l)
#flatten and fix the lists of lists to get a list of dicts
dicts = [fix(i) for sublist in data.values() for i in sublist]
#Add the lists into a single dataframe (optional add the 'Details' column)
df = pd.DataFrame.from_dict(dicts)
df['details'] = pd.DataFrame.from_dict(data) #adding 'Details' col
print(df)
brand car colour type \
0 honda city black NaN
1 toyota innova NaN NaN
2 honda NaN red NaN
3 toyota corolla white sedan
details
0 [brand : honda, car : city, colour : black]
1 [brand : toyota, car : innova]
2 [brand : honda, colour : red]
3 [brand : toyota, car : corolla, colour : white...
import pandas as pd
from collections import ChainMap
data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']]}
#STEP_1
lists=[[{y.split(':')[0]:y.split(':')[1]} for y in x] for x in data['details']]
#STEP_2
data_df = [dict(ChainMap(*x)) for x in lists]
#STEP_3
data_df=pd.DataFrame(data_df)
#STEP_4
data_df['details']=data['details']
print(data_df)
'''Explanation:
STEP_1: It creates list of lists with dictionary elements
[[{'brand ': ' honda'}, {'car ': ' city'}, {'colour ': ' black'}],
[{'brand ': ' toyota'}, {'car ': ' innova'}],
[{'brand ': ' honda'}, {'colour ': ' red'}],
[{'brand ': ' toyota'},
{'car ': ' corolla'},
{'colour ': ' white'},
{'type ': ' sedan'}]]
STEP_2: It is to convert list of lists to list of dictionaries
[{'colour ': ' black', 'car ': ' city', 'brand ': ' honda'},
{'car ': ' innova', 'brand ': ' toyota'},
{'colour ': ' red', 'brand ': ' honda'},
{'type ': ' sedan',
'colour ': ' white',
'car ': ' corolla',
'brand ': ' toyota'}]
STEP_3: As we can directly create a dataframe from list of
dictionaries, it creates a dataframe with 4 columns that are brand,
car, color & type
STEP_4: Add the column 'details' using the 'data' variable'''
Use:
explode
+ extract
to extract the patterns groupby
+ first
+ fillna
to convert to the expected format Code
# extract the patterns
pattern = r"(?:brand : (?P<brand>\w+))|(?:car : (?P<car>\w+))|(?:colour : (?P<colour>\w+))|(?:type : (?P<type>\w+))"
expanded = df.explode("details")["details"].str.extract(pattern)
# convert to expected format after extracting the patterns
new = expanded.groupby(level=0).first().fillna("")
print(new)
Output
brand car colour type
0 honda city black
1 toyota innova
2 honda red
3 toyota corolla white sedan
After you can concat all together by doing:
result = pd.concat([df, new], axis=1)
print(result)
Output (full)
details brand ... colour type
0 [brand : honda, car : city, colour : black] honda ... black
1 [brand : toyota, car : innova] toyota ...
2 [brand : honda, colour : red] honda ... red
3 [brand : toyota, car : corolla, colour : white... toyota ... white sedan
[4 rows x 5 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.