如何将 Pandas DF 中的列表转换为字符串？

Question

I have a pandas data frame.我有一个熊猫数据框。 One of the columns contains a list.其中一列包含一个列表。 I want that column to be a single string.我希望该列是单个字符串。

For example my list ['one','two','three'] should simply be 'one, two, three'例如我的列表['one','two','three']应该只是'one, two, three'

df['col'] = df['col'].astype(str).apply(lambda x: ', '.join(df['col'].astype(str)))

gives me ['one, two, three],['four','five','six'] where the second list is from the next row.给我['one, two, three],['four','five','six']其中第二个列表来自下一行。 Needless to say with millions of rows this concatenation across rows is not only incorrect, it kills my memory.不用说，对于数百万行，这种跨行的串联不仅不正确，而且会扼杀我的记忆。

Answer 1

You should certainly not convert to string before you transform the list.在转换列表之前，您当然不应该转换为字符串。 Try:尝试：

df['col'].apply(', '.join)

Also note that apply applies the function to the elements of the series, so using df['col'] in the lambda function is probably not what you want.另请注意， apply将该函数应用于系列的元素，因此在 lambda 函数中使用df['col']可能不是您想要的。

Or, there is a native .str.join method, but it is (surprisingly) a bit slower than apply .或者，有一个原生的.str.join方法，但它（令人惊讶地）比apply慢一点。

Answer 2

When you cast col to str with astype , you get a string representation of a python list, brackets and all.当您使用astype将col转换为str时，您将获得 python 列表、括号和所有内容的字符串表示形式。 You do not need to do that, just apply join directly:您不需要这样做，只需直接apply join即可：

import pandas as pd

df = pd.DataFrame({
    'A': [['a', 'b', 'c'], ['A', 'B', 'C']]
    })

# Out[8]: 
#            A
# 0  [a, b, c]
# 1  [A, B, C]

df['Joined'] = df.A.apply(', '.join)

#            A   Joined
# 0  [a, b, c]  a, b, c
# 1  [A, B, C]  A, B, C

Answer 3

You could convert your list to str with astype(str) and then remove ' , [ , ] characters.您可以使用astype(str)将列表转换为 str ，然后删除' 、 [ 、 ]字符。 Using @Yakim example:使用@Yakim 示例：

In [114]: df
Out[114]:
           A
0  [a, b, c]
1  [A, B, C]

In [115]: df.A.astype(str).str.replace('\[|\]|\'', '')
Out[115]:
0    a, b, c
1    A, B, C
Name: A, dtype: object

Timing定时

import pandas as pd
df = pd.DataFrame({'A': [['a', 'b', 'c'], ['A', 'B', 'C']]})
df = pd.concat([df]*1000)


In [2]: timeit df['A'].apply(', '.join)
292 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: timeit df['A'].str.join(', ')
368 µs ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [4]: timeit df['A'].apply(lambda x: ', '.join(x))
505 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: timeit df['A'].str.replace('\[|\]|\'', '')
2.43 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 4

Pandas 为此提供了一种方法， Series.str.join 。

如何将 Pandas DF 中的列表转换为字符串？

问题描述

4 个解决方案

解决方案1
40 已采纳 2016-05-20 13:22:24

解决方案2
12 2016-05-20 13:22:11

解决方案3
9 2016-05-20 13:34:39

解决方案4
1 2020-02-13 17:44:08

如何将 Pandas DF 中的列表转换为字符串？

问题描述

4 个解决方案

解决方案1 40 已采纳 2016-05-20 13:22:24

解决方案2 12 2016-05-20 13:22:11

解决方案3 9 2016-05-20 13:34:39

解决方案4 1 2020-02-13 17:44:08

解决方案1
40 已采纳 2016-05-20 13:22:24

解决方案2
12 2016-05-20 13:22:11

解决方案3
9 2016-05-20 13:34:39

解决方案4
1 2020-02-13 17:44:08