簡體   English   中英

在 pandas python 中將單行轉換為不同的 dataframe

[英]Convert a single row into a different dataframe in pandas python

我正在研究形狀為 146 行 x 48 列的 dataframe。 列是

['Region','Rank 2015','Score 2015','Economy 2015','Family 2015','Health 2015','Freedom 2015','Generosity 2015','Trust 2015','Rank 2016','Score 2016','Economy 2016','Family 2016','Health 2016','Freedom 2016','Generosity 2016','Trust 2016','Rank 2017','Score 2017','Economy 2017','Family 2017','Health 2017','Freedom 2017','Generosity 2017','Trust 2017','Rank 2018','Score 2018','Economy 2018','Family 2018','Health 2018','Freedom 2018','Generosity 2018','Trust 2018','Rank 2019','Score 2019','Economy 2019','Family 2019','Health 2019','Freedom 2019','Generosity 2019','Trust 2019','Score Mean','Economy Mean','Family Mean','Health Mean','Freedom Mean','Generosity Mean','Trust Mean']

我想訪問特定行並將其轉換為以下 dataframe

    Year    Rank    Score   Family  Health  Freedom Generosity  Trust
0   2015     NaN      NaN     NaN     NaN     NaN         NaN   NaN
1   2016     NaN      NaN     NaN     NaN     NaN         NaN   NaN
2   2017     NaN      NaN     NaN     NaN     NaN         NaN   NaN
3   2018     NaN      NaN     NaN     NaN     NaN         NaN   NaN
4   2019     NaN      NaN     NaN     NaN     NaN         NaN   NaN 

歡迎任何幫助,並在此先感謝您。

另一種方法:

cols=['Region','Rank 2015','Score 2015','Economy 2015','Family 2015','Health 2015','Freedom 2015','Generosity 2015', 'Trust 2015','Rank 2016','Score 2016','Economy 2016','Family 2016','Health 2016','Freedom 2016','Generosity 2016','Trust 2016', 'Rank 2017','Score 2017','Economy 2017','Family 2017','Health 2017','Freedom 2017','Generosity 2017','Trust 2017','Rank 2018','Score 2018','Economy 2018','Family 2018','Health 2018','Freedom 2018','Generosity 2018','Trust 2018','Rank 2019','Score 2019','Economy 2019','Family 2019','Health 2019','Freedom 2019','Generosity 2019','Trust 2019','Score Mean','Economy Mean','Family Mean','Health Mean','Freedom Mean','Generosity Mean','Trust Mean']

# source dataframe
df1 = pd.DataFrame(columns=cols)
df1.loc[0] = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

#target dataframe
df2 = pd.DataFrame(columns=['Year','Rank','Score','Family','Health','Freedom','Generosity','Trust','Economy'])
df2['Year']=['2015','2016','2017','2018','2019','Mean']

df2.set_index('Year', inplace=True)

idx = 0  # source row to copy

for col in df1.columns[1:]: 
    c,r = col.split(" ")
    df2.at[r,c] = df1.at[idx, col]

print (df2)

    Rank Score Family Health Freedom Generosity Trust Economy
Year
2015    1     1      1      1       1          1     1       1
2016    1     1      1      1       1          1     1       1
2017    1     1      1      1       1          1     1       1
2018    1     1      1      1       1          1     1       1
2019    1     1      1      1       1          1     1       1
Mean  NaN     1      1      1       1          1     1       1

這是一個利用列表理解的解決方案:

輸入:

cols = ['Region','Rank 2015','Score 2015','Economy 2015','Family 2015','Health 2015','Freedom 2015','Generosity 2015','Trust 2015','Rank 2016','Score 2016','Economy 2016','Family 2016','Health 2016','Freedom 2016','Generosity 2016','Trust 2016','Rank 2017','Score 2017','Economy 2017','Family 2017','Health 2017','Freedom 2017','Generosity 2017','Trust 2017','Rank 2018','Score 2018','Economy 2018','Family 2018','Health 2018','Freedom 2018','Generosity 2018','Trust 2018','Rank 2019','Score 2019','Economy 2019','Family 2019','Health 2019','Freedom 2019','Generosity 2019','Trust 2019','Score Mean','Economy Mean','Family Mean','Health Mean','Freedom Mean','Generosity Mean','Trust Mean']
df = pd.DataFrame(np.random.randint(1,10,(3,48)))
df.columns = cols
print(df.iloc[:, :4])

   Region  Rank 2015  Score 2015  Economy 2015
0       7          9           9             9
1       8          7           2             3
2       3          3           4             5

而新的 dataframe 將是:

target_cols = ['Rank', 'Score', 'Family', 'Health', 'Freedom', 'Generosity', 'Trust']
years = ['2015', '2016', '2017', '2018', '2019']
newdf = pd.DataFrame([df.loc[1, [x + ' ' + year for x in target_cols]].values for year in years])
newdf.columns = target_cols
newdf['year'] = years
print(newdf)

   Rank  Score  Family  Health  Freedom  Generosity  Trust  year
0     7      2       6       9        3           4      9  2015
1     2      8       1       1        7           6      1  2016
2     7      4       2       5        1           7      4  2017
3     9      7       1       4        7           5      2  2018
4     5      4       4       9        1           6      2  2019

假設您只有 2015 年至 2019 年的目標年份; 並且目標列是已知的。

我將按如下方式進行:(1)定義目標列和年份target_columns = ['Rank', 'Score', 'Family', 'Health', 'Freedom', 'Generosity', 'Trust'] target_years = ['2015', '2016', '2017', '2018', '2019']

(2) 檢索特定行,我假設您的起始 dataframe 為initial_dataframe

particular_row = initial_dataframe.iloc[0]

(3) 從particular_row中檢索和重塑信息

reshaped_row = { 'Year': target_years }

reshaped_row.update({ column_name: [ particular_row[column_name + ' ' + year_name] for year_name in target_years ] for column_name in target_columns })

(4) 將重塑后的行分配給output_dataframe

output_dataframe = pd.Dataframe(reshaped_row)

您是否嘗試過使用二維數組? 我會發現這是最簡單的。 否則,您也可以使用字典。 https://www.w3schools.com/python/python_dictionaries.asp

我沒有正確回答您的問題,但我可以提示您如何翻譯數據。

df = pd.DataFrame(li)
df = df[0].str.split("(\d{4})", expand=True)
df = df[df[2]==""]
col_name = df[0].unique()

df_new = df.pivot(index=1, columns=0, values=2)
df_new.drop(df_new.index[0], inplace=True)

df_new:

     Economy    Family  Freedom Generosity  Health  Rank    Score   Trust
1                               
2016                                
2017                                
2018                                
2019            

    

您可以編寫自己的邏輯。

它需要很多操作,一個簡單的想法是修改為所需的dict然后制作df

In [61]: dicts = {}

In [62]: for t in text[1:]:
    ...:     n,y = t.split(" ")
    ...:     if n not in dicts:
    ...:         dicts[n]=[]
    ...:     if y !="Mean":
    ...:         if n == 'Rank':
    ...:             dicts[n].append(y)
    ...:         else:
    ...:             dicts[n].append(pd.np.NaN)
    ...:

In [63]: df = pd.DataFrame(dicts)

In [64]: df['Year'] = df['Rank']

In [65]: df['Rank'] = df['Family']

In [66]: df
Out[66]:
   Rank  Score  Economy  Family  Health  Freedom  Generosity  Trust  Year
0   NaN    NaN      NaN     NaN     NaN      NaN         NaN    NaN  2015
1   NaN    NaN      NaN     NaN     NaN      NaN         NaN    NaN  2016
2   NaN    NaN      NaN     NaN     NaN      NaN         NaN    NaN  2017
3   NaN    NaN      NaN     NaN     NaN      NaN         NaN    NaN  2018
4   NaN    NaN      NaN     NaN     NaN      NaN         NaN    NaN  2019

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM