![](/img/trans.png)
[英]Split values iterating over an unspecific number of columns in a pandas data frame
[英]Split values in Data frame columns
我有一個數據框名稱df,我想刪除此“ |” 在燃油欄中
id car fuel
1 Mercedes petrol|diesel|gas
2 Audi gas|petrol
這樣我的數據看起來像這樣
id car fuel
1 Mercedes petrol
1 Mercedes diesel
1 Mercedes gas
2 Audi gas
2 Audi petrol
這是我嘗試過的代碼
df_1 = hb.copy()
df_2 = hb.copy()
df_3 = hb.copy()
df_1['fuel'] = df_1['fuel'].apply(lambda x:x.split('|')[0])
df_2['fuel'] = df_2['fuel'].apply(lambda x:x.split('|')[1])
df_3['fuel'] = df_3['fuel'].apply(lambda x:x.split('|')[2])
這給IndexError:列表索引超出范圍
嘗試這個:
df=pd.DataFrame({'car':['Mercedes','Audi'],'fuel':['petrol|diesel|gas','gas|petrol']}) #your dataframe
df2=pd.DataFrame() #new black dataframe
for i in range(0,len(df)): #iterating over df
list1=df.iloc[i,1].split('|') #split each value of 'fuel' and store it in a list
for j in range(0,len(list1)): #iterating over list1
list2={'car':df.iloc[i,0],'fuel':list1[j]} #make a dict of each combination of 'car' and elements of list1-'fuel'
df2=df2.append(list2,ignore_index=True) #append each value to the blank df
這是一種方法。
例如:
df = pd.DataFrame({
"id":[1,2],
"car":["Mercedes","Audi"],
"fuel":["petrol|diesel|gas","gas|petrol"]
})
df["fuel"] = df["fuel"].str.split("|")
#Ref https://stackoverflow.com/a/48532692/532312
lst_col = 'fuel'
df = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
print(df)
輸出:
car fuel id
0 Mercedes petrol 1
1 Mercedes diesel 1
2 Mercedes gas 1
3 Audi gas 2
4 Audi petrol 2
您可以嘗試如下操作:
#Create the dataframe
df = pd.DataFrame({
"id":[1,2],
"car":["Mercedes","Audi"],
"fuel":["petrol|diesel|gas","gas|petrol"]
})
#Create a new dataframe from the series, with car as the index
new_df = pd.DataFrame(df.fuel.str.split('|').tolist(), index=df.car).stack()
#Get rid of the secondary index
new_df = new_df.reset_index([0, 'car'])
#Add the 'id' back to the dataframe
#Note: There is probably a much more elegant way of doing this
new_df.loc[:,'id'] = new_df.car.apply(lambda x: df[df.loc[:,'car'] == x].id.values[0])
#Rename the columns
new_df.columns = ["car","fuel","id"]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.