简体   繁体   中英

How can we split a dataframe with one column into multiple columns?

I have a dataframe that looks like this.

    0
0   SPY  SPDR S&P 500 ETF                                     -2.30%    4.96%   -2.60%    8.76%   35.50%   63.81%
1   IVV  iShares Core S&P 500 ETF                             -2.32%    4.93%   -2.66%    8.96%   36.10%   63.76%
2   VTI  Vanguard Total Stock Market ETF                      -2.20%    5.41%   -3.05%    7.67%   33.37%   59.23%
3   VOO  Vanguard S&P 500 ETF                                 -2.33%    4.95%   -2.72%    8.76%   35.66%   64.25%
4   QQQ  Invesco QQQ                                          -1.06%    5.29%   14.83%   31.81%   80.45%  134.61%
5   AGG  iShares Core U.S. Aggregate Bond ETF                 -0.02%    0.48%    5.76%    8.87%   15.65%   21.26%
6   VEA  Vanguard FTSE Developed Markets ETF                  -2.06%    7.97%  -10.14%   -1.83%    3.42%   12.27%
7   IEFA iShares Core MSCI EAFE ETF                           -1.74%    8.25%  -10.05%   -1.59%    3.67%   12.70%
8   GLD  SPDR Gold Trust                                      -0.62%   -1.27%   13.76%   27.88%   36.22%   43.45%
9   VUG  Vanguard Growth ETF                                  -1.30%    5.52%   10.38%   24.19%   62.45%   96.12%
10  VWO  Vanguard FTSE Emerging Markets ETF                   -2.07%    6.80%  -10.50%   -1.82%    6.16%   10.85%
11  BND  Vanguard Total Bond Market ETF                        0.02%    0.58%    5.86%    9.18%   15.91%   22.82%
12  IWF  iShares Russell 1000 Growth ETF                      -1.40%    5.25%    8.68%   22.23%   65.17%  102.32%

The problem is, that there is just one column, and to do any useful analysis, I need seperate columns for each feature. I tried to split the features and name the features, like this.

foo = lambda x: pd.Series([i for i in reversed(df.split(' '))])
rev = df['symbol', 'name', 'one_week_return', 'four_week_return', 'ytd', '1Y', '3Y', '5Y'].apply(foo)

That's throwing a 'key error'. Does anyone here have any idea how to split the columns and name them? Thanks.

Try this:

df1 = df[0].str.rsplit(n=6, expand=True)
df2 = df1.pop(0).str.split(n=1, expand=True)

df = pd.concat([df2, df1], axis=1)
df.columns =['symbol', 'name', 'one_week_return', 'four_week_return', 'ytd', '1Y', '3Y', '5Y']
print(df)

Output:

   symbol                                  name one_week_return four_week_return      ytd      1Y      3Y       5Y
0     SPY                      SPDR S&P 500 ETF          -2.30%            4.96%   -2.60%   8.76%  35.50%   63.81%
1     IVV              iShares Core S&P 500 ETF          -2.32%            4.93%   -2.66%   8.96%  36.10%   63.76%
2     VTI       Vanguard Total Stock Market ETF          -2.20%            5.41%   -3.05%   7.67%  33.37%   59.23%
3     VOO                  Vanguard S&P 500 ETF          -2.33%            4.95%   -2.72%   8.76%  35.66%   64.25%
4     QQQ                           Invesco QQQ          -1.06%            5.29%   14.83%  31.81%  80.45%  134.61%
5     AGG  iShares Core U.S. Aggregate Bond ETF          -0.02%            0.48%    5.76%   8.87%  15.65%   21.26%
6     VEA   Vanguard FTSE Developed Markets ETF          -2.06%            7.97%  -10.14%  -1.83%   3.42%   12.27%
7    IEFA            iShares Core MSCI EAFE ETF          -1.74%            8.25%  -10.05%  -1.59%   3.67%   12.70%
8     GLD                       SPDR Gold Trust          -0.62%           -1.27%   13.76%  27.88%  36.22%   43.45%
9     VUG                   Vanguard Growth ETF          -1.30%            5.52%   10.38%  24.19%  62.45%   96.12%
10    VWO    Vanguard FTSE Emerging Markets ETF          -2.07%            6.80%  -10.50%  -1.82%   6.16%   10.85%
11    BND        Vanguard Total Bond Market ETF           0.02%            0.58%    5.86%   9.18%  15.91%   22.82%
12   1IWF       iShares Russell 1000 Growth ETF          -1.40%            5.25%    8.68%  22.23%  65.17%  102.32%

I think you need to preprocess your file first. Separate each column with a comma ,

import pandas as pd
with open('input.csv', 'r') as f, open('output.csv', 'w') as w:
    for line in f:
        l = [item for item in line.split('  ') if item]
        w.write(','.join(l))

Now if you read output.csv using pd.read_csv , you can read them as separate columns.

I used the following as my input.csv file: link to file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM