[英]How do I convert a list in a Pandas DF into a string?
I have a pandas data frame.我有一个熊猫数据框。 One of the columns contains a list.
其中一列包含一个列表。 I want that column to be a single string.
我希望该列是单个字符串。
For example my list ['one','two','three']
should simply be 'one, two, three'
例如我的列表
['one','two','three']
应该只是'one, two, three'
df['col'] = df['col'].astype(str).apply(lambda x: ', '.join(df['col'].astype(str)))
gives me ['one, two, three],['four','five','six']
where the second list is from the next row.给我
['one, two, three],['four','five','six']
其中第二个列表来自下一行。 Needless to say with millions of rows this concatenation across rows is not only incorrect, it kills my memory.不用说,对于数百万行,这种跨行的串联不仅不正确,而且会扼杀我的记忆。
You should certainly not convert to string before you transform the list.在转换列表之前,您当然不应该转换为字符串。 Try:
尝试:
df['col'].apply(', '.join)
Also note that apply
applies the function to the elements of the series, so using df['col']
in the lambda function is probably not what you want.另请注意,
apply
将该函数应用于系列的元素,因此在 lambda 函数中使用df['col']
可能不是您想要的。
Or, there is a native .str.join
method, but it is (surprisingly) a bit slower than apply
.或者,有一个原生的
.str.join
方法,但它(令人惊讶地)比apply
慢一点。
When you cast col
to str
with astype
, you get a string representation of a python list, brackets and all.当您使用
astype
将col
转换为str
时,您将获得 python 列表、括号和所有内容的字符串表示形式。 You do not need to do that, just apply
join
directly:您不需要这样做,只需直接
apply
join
即可:
import pandas as pd
df = pd.DataFrame({
'A': [['a', 'b', 'c'], ['A', 'B', 'C']]
})
# Out[8]:
# A
# 0 [a, b, c]
# 1 [A, B, C]
df['Joined'] = df.A.apply(', '.join)
# A Joined
# 0 [a, b, c] a, b, c
# 1 [A, B, C] A, B, C
You could convert your list to str with astype(str)
and then remove '
, [
, ]
characters.您可以使用
astype(str)
将列表转换为 str ,然后删除'
、 [
、 ]
字符。 Using @Yakim example:使用@Yakim 示例:
In [114]: df
Out[114]:
A
0 [a, b, c]
1 [A, B, C]
In [115]: df.A.astype(str).str.replace('\[|\]|\'', '')
Out[115]:
0 a, b, c
1 A, B, C
Name: A, dtype: object
Timing定时
import pandas as pd
df = pd.DataFrame({'A': [['a', 'b', 'c'], ['A', 'B', 'C']]})
df = pd.concat([df]*1000)
In [2]: timeit df['A'].apply(', '.join)
292 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [3]: timeit df['A'].str.join(', ')
368 µs ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [4]: timeit df['A'].apply(lambda x: ', '.join(x))
505 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [5]: timeit df['A'].str.replace('\[|\]|\'', '')
2.43 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Pandas 为此提供了一种方法, Series.str.join
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.