简体   繁体   中英

Convert a single row into a different dataframe in pandas python

I am working on a dataframe of shape 146 rows x 48 columns. The columns are

['Region','Rank 2015','Score 2015','Economy 2015','Family 2015','Health 2015','Freedom 2015','Generosity 2015','Trust 2015','Rank 2016','Score 2016','Economy 2016','Family 2016','Health 2016','Freedom 2016','Generosity 2016','Trust 2016','Rank 2017','Score 2017','Economy 2017','Family 2017','Health 2017','Freedom 2017','Generosity 2017','Trust 2017','Rank 2018','Score 2018','Economy 2018','Family 2018','Health 2018','Freedom 2018','Generosity 2018','Trust 2018','Rank 2019','Score 2019','Economy 2019','Family 2019','Health 2019','Freedom 2019','Generosity 2019','Trust 2019','Score Mean','Economy Mean','Family Mean','Health Mean','Freedom Mean','Generosity Mean','Trust Mean']

I want to access a particular row and want to convert it to to the following dataframe

    Year    Rank    Score   Family  Health  Freedom Generosity  Trust
0   2015     NaN      NaN     NaN     NaN     NaN         NaN   NaN
1   2016     NaN      NaN     NaN     NaN     NaN         NaN   NaN
2   2017     NaN      NaN     NaN     NaN     NaN         NaN   NaN
3   2018     NaN      NaN     NaN     NaN     NaN         NaN   NaN
4   2019     NaN      NaN     NaN     NaN     NaN         NaN   NaN 

Any help is welcomed & Thank you in advance.

An alternate way:

cols=['Region','Rank 2015','Score 2015','Economy 2015','Family 2015','Health 2015','Freedom 2015','Generosity 2015', 'Trust 2015','Rank 2016','Score 2016','Economy 2016','Family 2016','Health 2016','Freedom 2016','Generosity 2016','Trust 2016', 'Rank 2017','Score 2017','Economy 2017','Family 2017','Health 2017','Freedom 2017','Generosity 2017','Trust 2017','Rank 2018','Score 2018','Economy 2018','Family 2018','Health 2018','Freedom 2018','Generosity 2018','Trust 2018','Rank 2019','Score 2019','Economy 2019','Family 2019','Health 2019','Freedom 2019','Generosity 2019','Trust 2019','Score Mean','Economy Mean','Family Mean','Health Mean','Freedom Mean','Generosity Mean','Trust Mean']

# source dataframe
df1 = pd.DataFrame(columns=cols)
df1.loc[0] = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

#target dataframe
df2 = pd.DataFrame(columns=['Year','Rank','Score','Family','Health','Freedom','Generosity','Trust','Economy'])
df2['Year']=['2015','2016','2017','2018','2019','Mean']

df2.set_index('Year', inplace=True)

idx = 0  # source row to copy

for col in df1.columns[1:]: 
    c,r = col.split(" ")
    df2.at[r,c] = df1.at[idx, col]

print (df2)

    Rank Score Family Health Freedom Generosity Trust Economy
Year
2015    1     1      1      1       1          1     1       1
2016    1     1      1      1       1          1     1       1
2017    1     1      1      1       1          1     1       1
2018    1     1      1      1       1          1     1       1
2019    1     1      1      1       1          1     1       1
Mean  NaN     1      1      1       1          1     1       1

Here's a solution utilizing list comprehension:

The input:

cols = ['Region','Rank 2015','Score 2015','Economy 2015','Family 2015','Health 2015','Freedom 2015','Generosity 2015','Trust 2015','Rank 2016','Score 2016','Economy 2016','Family 2016','Health 2016','Freedom 2016','Generosity 2016','Trust 2016','Rank 2017','Score 2017','Economy 2017','Family 2017','Health 2017','Freedom 2017','Generosity 2017','Trust 2017','Rank 2018','Score 2018','Economy 2018','Family 2018','Health 2018','Freedom 2018','Generosity 2018','Trust 2018','Rank 2019','Score 2019','Economy 2019','Family 2019','Health 2019','Freedom 2019','Generosity 2019','Trust 2019','Score Mean','Economy Mean','Family Mean','Health Mean','Freedom Mean','Generosity Mean','Trust Mean']
df = pd.DataFrame(np.random.randint(1,10,(3,48)))
df.columns = cols
print(df.iloc[:, :4])

   Region  Rank 2015  Score 2015  Economy 2015
0       7          9           9             9
1       8          7           2             3
2       3          3           4             5

And the new dataframe would be:

target_cols = ['Rank', 'Score', 'Family', 'Health', 'Freedom', 'Generosity', 'Trust']
years = ['2015', '2016', '2017', '2018', '2019']
newdf = pd.DataFrame([df.loc[1, [x + ' ' + year for x in target_cols]].values for year in years])
newdf.columns = target_cols
newdf['year'] = years
print(newdf)

   Rank  Score  Family  Health  Freedom  Generosity  Trust  year
0     7      2       6       9        3           4      9  2015
1     2      8       1       1        7           6      1  2016
2     7      4       2       5        1           7      4  2017
3     9      7       1       4        7           5      2  2018
4     5      4       4       9        1           6      2  2019

Assuming that you have only the target years are those spanning between 2015 and 2019; and that the target columns are known.

I would procede as follows: (1) define the target columns and years target_columns = ['Rank', 'Score', 'Family', 'Health', 'Freedom', 'Generosity', 'Trust'] target_years = ['2015', '2016', '2017', '2018', '2019']

(2) retrieve the particular row, I assume your starting dataframe to be initial_dataframe

particular_row = initial_dataframe.iloc[0]

(3) retrieve and reshape the information from the particular_row

reshaped_row = { 'Year': target_years }

reshaped_row.update({ column_name: [ particular_row[column_name + ' ' + year_name] for year_name in target_years ] for column_name in target_columns })

(4) assign the reshaped row to the output_dataframe

output_dataframe = pd.Dataframe(reshaped_row)

Have you tried using a 2D array? I would find that to be the easiest. Otherwise, you could also use a dictionary. https://www.w3schools.com/python/python_dictionaries.asp

I didn't get your question properly but I can give you hint how to translate the data.

df = pd.DataFrame(li)
df = df[0].str.split("(\d{4})", expand=True)
df = df[df[2]==""]
col_name = df[0].unique()

df_new = df.pivot(index=1, columns=0, values=2)
df_new.drop(df_new.index[0], inplace=True)

df_new:

     Economy    Family  Freedom Generosity  Health  Rank    Score   Trust
1                               
2016                                
2017                                
2018                                
2019            

    

You can write your own logic.

It needs a lot of manipulation, a simple idea is to modify to required dict and then make df

In [61]: dicts = {}

In [62]: for t in text[1:]:
    ...:     n,y = t.split(" ")
    ...:     if n not in dicts:
    ...:         dicts[n]=[]
    ...:     if y !="Mean":
    ...:         if n == 'Rank':
    ...:             dicts[n].append(y)
    ...:         else:
    ...:             dicts[n].append(pd.np.NaN)
    ...:

In [63]: df = pd.DataFrame(dicts)

In [64]: df['Year'] = df['Rank']

In [65]: df['Rank'] = df['Family']

In [66]: df
Out[66]:
   Rank  Score  Economy  Family  Health  Freedom  Generosity  Trust  Year
0   NaN    NaN      NaN     NaN     NaN      NaN         NaN    NaN  2015
1   NaN    NaN      NaN     NaN     NaN      NaN         NaN    NaN  2016
2   NaN    NaN      NaN     NaN     NaN      NaN         NaN    NaN  2017
3   NaN    NaN      NaN     NaN     NaN      NaN         NaN    NaN  2018
4   NaN    NaN      NaN     NaN     NaN      NaN         NaN    NaN  2019

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM