[英]merge every two columns on pandas.DataFrame
I'd like to pandas.DataFrame for every two columns. 我想为每两列pandas.DataFrame。
For example, I have the following dataframe: 例如,我有以下数据帧:
pd.DataFrame([[10,"5%", 20, "10%"],[30,"15%", 40,"20%"]], columns=['error1', '(%)', 'error2', '(%)'])
Then, what I'd like to get is the following dataframe: 那么,我想得到的是以下数据帧:
pd.DataFrame([["10 (5%)", "20 (10%)"],["30 (15%)", "40 (20%)"]], columns=['error1 (%)', 'error2 (%)'])
For this data frame: 对于此数据框:
print(df)
error1 (%) error2 (%)
0 10 5% 20 10%
1 30 15% 40 20%
This works: 这有效:
def make_func(offset=0):
def func(x):
return '{} ({})'.format(x[0 + offset], x[1 + offset])
return func
df2 = pd.DataFrame()
for offset in range(0, df.shape[1], 2):
df2['{} (%)'.format(df.columns[offset])] = df.apply(make_func(offset), axis=1)
Result: 结果:
print(df2)
error1 (%) error2 (%)
0 10 (5%) 20 (10%)
1 30 (15%) 40 (20%)
You can try: 你可以试试:
import pandas as pd
df = pd.DataFrame([[10,"5%", 20, "10%"],[30,"15%", 40,"20%"]],
columns=['error1', '(%)', 'error2', '(%)'])
print df
error1 (%) error2 (%)
0 10 5% 20 10%
1 30 15% 40 20%
cols = (' '.join(w) for w in zip(df.columns[::2], df.columns[1::2]))
print pd.DataFrame(df.ix[:, ::2].astype(str).values +
' (' +
df.ix[:, 1::2].values +
')', index=df.index, columns=cols)
error1 (%) error2 (%)
0 10 (5%) 20 (10%)
1 30 (15%) 40 (20%)
Odd and even columns names: 奇数和偶数列名称:
In [80]: df.columns[::2]
Out[80]: Index([u'error1', u'error2'], dtype='object')
In [81]: df.columns[1::2]
Out[81]: Index([u'(%)', u'(%)'], dtype='object')
List of tuples by zip
: zip
列表:
In [82]: zip(df.columns[::2], df.columns[1::2])
Out[82]: [('error1', '(%)'), ('error2', '(%)')]
Generator - join items of tuples: 生成器 - 加入元组项:
In [83]: (' '.join(w) for w in zip(df.columns[::2], df.columns[1::2]))
Out[83]: <generator object <genexpr> at 0x0000000015158EE8>
In [84]: list((' '.join(w) for w in zip(df.columns[::2], df.columns[1::2])))
Out[84]: ['error1 (%)', 'error2 (%)']
Cast integer values to string by astype
and convert to numpy array by df.values
: 通过投整数值到字符串
astype
和通过转换成numpy的阵列df.values
:
In [89]: df.ix[:, ::2].astype(str).values
Out[89]:
array([['10', '20'],
['30', '40']], dtype=object)
In [90]: df.ix[:, 1::2].values
Out[90]:
array([['5%', '10%'],
['15%', '20%']], dtype=object)
Comparing with another answer [2 rows x 4000 columns]
: 与另一个答案相比
[2 rows x 4000 columns]
:
df = pd.DataFrame([[10,"5%", 20, "10%"]*1000,[30,"15%", 40,"20%"]*1000],
columns=['error1', '(%)', 'error2', '(%)']*1000)
def VAL(df):
cols = (' '.join(w) for w in zip(df.columns[::2], df.columns[1::2]))
return pd.DataFrame(df.ix[:, ::2].astype(str).values +
' (' +
df.ix[:, 1::2].values +
')', index=df.index, columns=cols)
def APL(df):
def make_func(offset=0):
def func(x):
return '{} ({})'.format(x[0 + offset], x[1 + offset])
return func
df2 = pd.DataFrame()
for offset in range(0, df.shape[1], 2):
df2['{} (%)'.format(df.columns[offset])] = df.apply(make_func(offset), axis=1)
return df2
VAL(df)
APL(df)
In [97]: %timeit VAL(df)
...: %timeit APL(df)
...:
100 loops, best of 3: 10.4 ms per loop
1 loops, best of 3: 3.65 s per loop
This is not the fastest solution, but it is probably the most readable one: 这不是最快的解决方案,但它可能是最易读的解决方案:
import pandas as pd
# define how you want to transform each list into a list of coupled data
def make_couples(ls):
return ['{} ({})'.format(*item) for item in zip(ls[::2], ls[1::2])]
df = pd.DataFrame([[10,"5%", 20, "10%"],[30,"15%", 40,"20%"]], columns=['error1', '%', 'error2', '%'])
df2 = pd.DataFrame(columns=make_couples(df.columns), data=map(make_couples, df.values))
df2 will be: df2将是:
error1 (%) error2 (%)
0 10 (5%) 20 (10%)
1 30 (15%) 40 (20%)
Readability counts =). 可读性计数=)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.