I want to slice and copy columns in a Python Dataframe. My data frame looks like the following:
1928 1928.1 1929 1929.1 1930 1930.1
0 0 0 0 0 0 0
1 1 3 3 2 2 2
2 4 1 3 0 1 2
I want to make it of the form
1928 1928.1 1929 1929.1 1930 1930.1
0 0 0
1 1 3
2 4 1
3 0 0
4 3 2
5 3 0
6 0 0
7 2 2
8 1 2
Which basically means that I want to shift the values in Columns '1929','1929.1','1930','1930.1' under the column '1928' and '1928.1'
For the same, I wrote the code as
[In]x=2
y=2
b=3
c=x-1
for a in range(0,2):
df.iloc[b:(b+3),0:x]=df.iloc[0:3,x:(x+y)]
x=x+2
b=b+3
[In] df
[Out]
1928 1928.1 1929 1929.1 1930 1930.1
0 0 0 0 0 0 0
1 1 3 3 2 2 2
2 4 1 3 0 1 2
No copying takes places within the columns. How shall I modify my code??
If your're ok with having a new dataframe, simply concatenate the columns:
df1 = df[['1928','1928.1']]
df2 = df[['1929','1929.1']]
df2.columns = ['1928','1928.1']
df3 = df[['1930','1930.1']]
df3.columns = ['1928','1928.1']
df = pd.concat([df1,df2,df3])
I think thats the most readable, and easiest way. You can overwrite your original dataframe and discard the others.
Setup
cols = ['1929', '1929.1', '1930', '1930.1']
vals = df[cols].values.reshape(-1, 2)
numpy.vstack
with append
:
df[['1928', '1928.1']].append(
pd.DataFrame(
np.vstack([vals[::2], vals[1::2]]), columns = ['1928', '1928.1']
)
)
1928 1928.1
0 0 0
1 1 3
2 4 1
0 0 0
1 3 2
2 3 0
3 0 0
4 2 2
5 1 2
One way is to use itertools.chain
:
from itertools import chain
cols = df.columns
res = pd.DataFrame({cols[0]: list(chain.from_iterable(df.iloc[:, ::2].T.values)),
cols[1]: list(chain.from_iterable(df.iloc[:, 1::2].T.values))})\
.join(pd.DataFrame(columns=cols[2:]))
print(res)
1928 1928.1 1929 1929.1 1930 1930.1
0 0 0 NaN NaN NaN NaN
1 1 3 NaN NaN NaN NaN
2 4 1 NaN NaN NaN NaN
3 0 0 NaN NaN NaN NaN
4 3 2 NaN NaN NaN NaN
5 3 0 NaN NaN NaN NaN
6 0 0 NaN NaN NaN NaN
7 2 2 NaN NaN NaN NaN
8 1 2 NaN NaN NaN NaN
Group by the the first four characters of the column names
#def key(s):
# return s[:4]
#gb = df.groupby(key, axis=1)
gb = df.groupby(by=df.columns.str[:4], axis=1)
n_cols = len(df.columns) // len(gb)
col_names = df.iloc[:,:n_cols].columns
For each group's DataFrame, rename the columns and concatenate - this produces a new DataFrame with only two columns
dz = pd.concat(d.rename(columns=dict(item for item in zip(d.columns,col_names))) for g,d in gb)
dz.index = range(len(dz))
frames = []
for g,d in gb:
d.columns = col_names
frames.append(d)
dy = pd.concat(frames)
dy.index = range(len(dy))
Will work for more than six columns.
Relies on all groups having the same number of columns.
Relies on columns being sorted by their labels.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.