[英]How to split dataframe by specific string in rows
我有一個像這樣的 dataframe:
df = pd.DataFrame({"a":["x1", 12, 14, "x2", 32, 9]})
df
Out[10]:
a
0 x1
1 12
2 14
3 x2
4 32
5 9
如果行以“x”開頭,我想將其拆分為多個數據幀(在本例中為兩個)。 然后這一行應該是列名。 也許拆分這些數據框並放入字典中?
output 應該是這樣的:
x1
Out[12]:
x1
0 12
1 14
x2
Out[13]:
x2
0 32
1 9
任何人都可以幫助我嗎?
您可以在str.startswith
cumsum
在groupby
上嘗試:
for k, d in df.groupby(df['a'].str.startswith('x').fillna(0).cumsum()):
# manipulate data to get desired output
sub_df = pd.DataFrame(d.iloc[1:].to_numpy(), columns=d.iloc[0].to_numpy())
# do something with it
print(sub_df)
print('-'*10)
Output:
x1
0 12
1 14
----------
x2
0 32
1 9
----------
像這樣的東西應該工作:
import pandas as pd
df = pd.DataFrame({"a":["x1", 12, 14, "x2", 32, 9]})
## Get the row index of value starting with x
ixs = []
for j in df.index:
if isinstance(df.loc[j,'a'],str):
if df.loc[j,'a'].startswith('x'):
ixs.append(j)
dicto = {}
for i,val in enumerate(ixs):
start_ix = ixs[i]
if i == len(ixs) - 1:
end_ix = df.index[-1]
else:
end_ix = ixs[i+1] - 1
new_df = df.loc[start_ix:end_ix,'a'].reset_index(drop=True)
new_df.columns = new_df.iloc[0]
new_df.drop(new_df.index[0],inplace=True)
dicto[i] = new_df
groupby
就像一本字典,所以我們可以明確地將其設為一個:
dfs = {f'x{k}':d for k, d in df.groupby(df['a'].str.startswith('x').fillna(False).cumsum())}
for k in dfs:
dfs[k].columns = dfs[k].iloc[0].values # Make x row the header.
dfs[k] = dfs[k].iloc[1:] # drop x row.
print(dfs[k], '\n')
Output:
x1
1 12
2 14
x2
4 32
5 9
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.