简体   繁体   English

如何将 Pandas DF 中的列表转换为字符串?

[英]How do I convert a list in a Pandas DF into a string?

I have a pandas data frame.我有一个熊猫数据框。 One of the columns contains a list.其中一列包含一个列表。 I want that column to be a single string.我希望该列是单个字符串。

For example my list ['one','two','three'] should simply be 'one, two, three'例如我的列表['one','two','three']应该只是'one, two, three'

df['col'] = df['col'].astype(str).apply(lambda x: ', '.join(df['col'].astype(str)))

gives me ['one, two, three],['four','five','six'] where the second list is from the next row.给我['one, two, three],['four','five','six']其中第二个列表来自下一行。 Needless to say with millions of rows this concatenation across rows is not only incorrect, it kills my memory.不用说,对于数百万行,这种跨行的串联不仅不正确,而且会扼杀我的记忆。

You should certainly not convert to string before you transform the list.在转换列表之前,您当然不应该转换为字符串。 Try:尝试:

df['col'].apply(', '.join)

Also note that apply applies the function to the elements of the series, so using df['col'] in the lambda function is probably not what you want.另请注意, apply将该函数应用于系列的元素,因此在 lambda 函数中使用df['col']可能不是您想要的。


Or, there is a native .str.join method, but it is (surprisingly) a bit slower than apply .或者,有一个原生的.str.join方法,但它(令人惊讶地)比apply慢一点。

When you cast col to str with astype , you get a string representation of a python list, brackets and all.当您使用astypecol转换为str时,您将获得 python 列表、括号和所有内容的字符串表示形式。 You do not need to do that, just apply join directly:您不需要这样做,只需直接apply join即可:

import pandas as pd

df = pd.DataFrame({
    'A': [['a', 'b', 'c'], ['A', 'B', 'C']]
    })

# Out[8]: 
#            A
# 0  [a, b, c]
# 1  [A, B, C]

df['Joined'] = df.A.apply(', '.join)

#            A   Joined
# 0  [a, b, c]  a, b, c
# 1  [A, B, C]  A, B, C

You could convert your list to str with astype(str) and then remove ' , [ , ] characters.您可以使用astype(str)将列表转换为 str ,然后删除'[]字符。 Using @Yakim example:使用@Yakim 示例:

In [114]: df
Out[114]:
           A
0  [a, b, c]
1  [A, B, C]

In [115]: df.A.astype(str).str.replace('\[|\]|\'', '')
Out[115]:
0    a, b, c
1    A, B, C
Name: A, dtype: object

Timing定时

import pandas as pd
df = pd.DataFrame({'A': [['a', 'b', 'c'], ['A', 'B', 'C']]})
df = pd.concat([df]*1000)


In [2]: timeit df['A'].apply(', '.join)
292 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: timeit df['A'].str.join(', ')
368 µs ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [4]: timeit df['A'].apply(lambda x: ', '.join(x))
505 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: timeit df['A'].str.replace('\[|\]|\'', '')
2.43 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Pandas 为此提供了一种方法, Series.str.join

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM