简体   繁体   中英

Copying columns within pandas dataframe

I want to slice and copy columns in a Python Dataframe. My data frame looks like the following:

     1928  1928.1  1929  1929.1  1930  1930.1
 0    0     0       0     0       0     0
 1    1     3       3     2       2     2
 2    4     1       3     0       1     2

I want to make it of the form

     1928  1928.1  1929 1929.1 1930 1930.1
 0   0     0            
 1   1     3          
 2   4     1                    
 3   0     0
 4   3     2
 5   3     0
 6   0     0
 7   2     2
 8   1     2 

Which basically means that I want to shift the values in Columns '1929','1929.1','1930','1930.1' under the column '1928' and '1928.1'

For the same, I wrote the code as

   [In]x=2
       y=2
       b=3
       c=x-1
       for a in range(0,2):
            df.iloc[b:(b+3),0:x]=df.iloc[0:3,x:(x+y)]
            x=x+2
            b=b+3
   [In] df
   [Out] 
     1928  1928.1  1929  1929.1  1930  1930.1
 0    0     0       0     0       0     0
 1    1     3       3     2       2     2
 2    4     1       3     0       1     2

No copying takes places within the columns. How shall I modify my code??

If your're ok with having a new dataframe, simply concatenate the columns:

df1 = df[['1928','1928.1']]
df2 = df[['1929','1929.1']]
df2.columns = ['1928','1928.1']
df3 = df[['1930','1930.1']]
df3.columns = ['1928','1928.1']

df = pd.concat([df1,df2,df3])

I think thats the most readable, and easiest way. You can overwrite your original dataframe and discard the others.

Setup

cols = ['1929', '1929.1', '1930', '1930.1']
vals = df[cols].values.reshape(-1, 2)

numpy.vstack with append :

df[['1928', '1928.1']].append(
    pd.DataFrame(
        np.vstack([vals[::2], vals[1::2]]), columns = ['1928', '1928.1']
    )
)

   1928  1928.1
0     0       0
1     1       3
2     4       1
0     0       0
1     3       2
2     3       0
3     0       0
4     2       2
5     1       2

One way is to use itertools.chain :

from itertools import chain

cols = df.columns

res = pd.DataFrame({cols[0]: list(chain.from_iterable(df.iloc[:, ::2].T.values)),
                    cols[1]: list(chain.from_iterable(df.iloc[:, 1::2].T.values))})\
        .join(pd.DataFrame(columns=cols[2:]))

print(res)

   1928  1928.1 1929 1929.1 1930 1930.1
0     0       0  NaN    NaN  NaN    NaN
1     1       3  NaN    NaN  NaN    NaN
2     4       1  NaN    NaN  NaN    NaN
3     0       0  NaN    NaN  NaN    NaN
4     3       2  NaN    NaN  NaN    NaN
5     3       0  NaN    NaN  NaN    NaN
6     0       0  NaN    NaN  NaN    NaN
7     2       2  NaN    NaN  NaN    NaN
8     1       2  NaN    NaN  NaN    NaN

Group by the the first four characters of the column names

#def key(s):
#    return s[:4]
#gb = df.groupby(key, axis=1)
gb = df.groupby(by=df.columns.str[:4], axis=1)

n_cols = len(df.columns) // len(gb)
col_names = df.iloc[:,:n_cols].columns

For each group's DataFrame, rename the columns and concatenate - this produces a new DataFrame with only two columns

dz = pd.concat(d.rename(columns=dict(item for item in zip(d.columns,col_names))) for g,d in gb)
dz.index = range(len(dz))

frames = []
for g,d in gb:
    d.columns = col_names
    frames.append(d)
dy = pd.concat(frames)
dy.index = range(len(dy))

Will work for more than six columns.
Relies on all groups having the same number of columns.
Relies on columns being sorted by their labels.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM