[英]concatenate text & numbers in python/pandas
I have a dataframe
as below我有一个
dataframe
如下
+---+---+---+
| A | B | C |
+---+---+---+
| 1 | 0 | 0 |
+---+---+---+
| 0 | 0 | 1 |
+---+---+---+
| 2 | 1 | 1 |
+---+---+---+
| 3 | 1 | 2 |
+---+---+---+
| 4 | 2 | 3 |
+---+---+---+
df = pd.DataFrame({
'A':[1,0,2,3,4],
'B':[0,0,1,1,2],
'C':[0,1,1,2,3]
})
My objective is to concatenate
each element with it's corresponding column name
and produce a series.我的目标是将每个元素与其对应的
column name
concatenate
起来并生成一个系列。
I tried below我在下面试过
df.dot(df.columns +', ').str[:-2]
what I get is我得到的是
+---------------------------+
| A |
+---------------------------+
| C |
+---------------------------+
| A, A, B, C |
+---------------------------+
| A, A, A, B, C, C |
+---------------------------+
| A, A, A, A, B, B, C, C, C |
+---------------------------+
But, I want is但是,我想要的是
+------------+
| A |
+------------+
| C |
+------------+
| 2A, B, C |
+------------+
| 3A, B, 2C |
+------------+
| 4A, 2B, 3C |
+------------+
How should I change my code to achieve this?我应该如何更改我的代码来实现这一点?
One idea with lambda function: lambda function 的一个想法:
f = lambda x: ', '.join(f'{v}{k}' if v != 1 else k for k, v in x[x > 0].items())
df = df.apply(f, axis=1)
print (df)
0 A
1 C
2 2A, B, C
3 3A, B, 2C
4 4A, 2B, 3C
dtype: object
Another idea with melting, remove 0
rows, join numbers with columns names and last join in groupby
:融化的另一个想法,删除
0
行,用列名连接数字并最后加入groupby
:
df = df.melt(ignore_index=False)
df = df[df['value'].ne(0)]
df['variable'] = df['value'].mask(df['value'].eq(1), '').astype(str) + df['variable']
df = df.groupby(level=0)['variable'].agg(', '.join)
print (df)
0 A
1 C
2 2A, B, C
3 3A, B, 2C
4 4A, 2B, 3C
Name: variable, dtype: object
Another way of solving this using collections.Counter
and List comprehension
:使用
collections.Counter
和List comprehension
解决此问题的另一种方法:
In [416]: from collections import Counter
In [403]: y = df.dot(df.columns).tolist()
In [420]: ans = [' ,'.join({k: (str(v)+k if v > 1 else k) for k,v in Counter(i).items()}.values()) if len(i) > 1 else i for i in y]
In [421]: pd.DataFrame(ans)
Out[421]:
0
0 A
1 C
2 2A ,B ,C
3 3A ,B ,2C
4 4A ,2B ,3C
Performance of solutions:解决方案的性能:
@jezrael solutions: @jezrael 解决方案:
In [427]: def j():
...: f = lambda x: ', '.join(f'{v}{k}' if v != 1 else k for k, v in x[x > 0].items())
...: df.apply(f, axis=1)
...:
In [428]: %timeit j()
1.22 ms ± 47.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [434]: def j1():
...: x = df.melt(ignore_index=False)
...: x = x[x['value'].ne(0)]
...: x['variable'] = x['value'].mask(x['value'].eq(1), '').astype(str) + x['variable']
...: x = x.groupby(level=0)['variable'].agg(', '.join)
...:
In [435]: %timeit j1()
3.19 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
My solution:我的解决方案:
In [429]: def m():
...: y = df.dot(df.columns).tolist()
...: ans = [' ,'.join({k: (str(v)+k if v > 1 else k) for k,v in Counter(i).items()}.values()) if len(i) > 1 else i for i in y]
...: pd.DataFrame(ans)
...:
In [430]: %timeit m()
213 µs ± 3.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.