I have a pandas data frame. One of the columns contains a list. I want that column to be a single string.
For example my list ['one','two','three']
should simply be 'one, two, three'
df['col'] = df['col'].astype(str).apply(lambda x: ', '.join(df['col'].astype(str)))
gives me ['one, two, three],['four','five','six']
where the second list is from the next row. Needless to say with millions of rows this concatenation across rows is not only incorrect, it kills my memory.
You should certainly not convert to string before you transform the list. Try:
df['col'].apply(', '.join)
Also note that apply
applies the function to the elements of the series, so using df['col']
in the lambda function is probably not what you want.
Or, there is a native .str.join
method, but it is (surprisingly) a bit slower than apply
.
When you cast col
to str
with astype
, you get a string representation of a python list, brackets and all. You do not need to do that, just apply
join
directly:
import pandas as pd
df = pd.DataFrame({
'A': [['a', 'b', 'c'], ['A', 'B', 'C']]
})
# Out[8]:
# A
# 0 [a, b, c]
# 1 [A, B, C]
df['Joined'] = df.A.apply(', '.join)
# A Joined
# 0 [a, b, c] a, b, c
# 1 [A, B, C] A, B, C
You could convert your list to str with astype(str)
and then remove '
, [
, ]
characters. Using @Yakim example:
In [114]: df
Out[114]:
A
0 [a, b, c]
1 [A, B, C]
In [115]: df.A.astype(str).str.replace('\[|\]|\'', '')
Out[115]:
0 a, b, c
1 A, B, C
Name: A, dtype: object
Timing
import pandas as pd
df = pd.DataFrame({'A': [['a', 'b', 'c'], ['A', 'B', 'C']]})
df = pd.concat([df]*1000)
In [2]: timeit df['A'].apply(', '.join)
292 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [3]: timeit df['A'].str.join(', ')
368 µs ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [4]: timeit df['A'].apply(lambda x: ', '.join(x))
505 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [5]: timeit df['A'].str.replace('\[|\]|\'', '')
2.43 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Pandas 为此提供了一种方法, Series.str.join
。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.