如何在python中組合多個字符串列列表？

Question

我有一個Python Pandas數據框。

我嘗試創建一個新列total_str ，它是colA和colB中的值的列表。

這是預期的輸出：

       colA           colB              total_str
0  ['a','b','c'] ['a','b','c']   ['a','b','c','a','b','c']
1  ['a','b','c']      nan        ['a','b','c']
2  ['a','b','c']   ['d','e']     ['a','b','c','d','e']

Answer 1

#replace nan with empty list and then concatenate colA and colB using sum.
df['total_str'] = df.applymap(lambda x: [] if x is np.nan else x).apply(lambda x: sum(x,[]), axis=1)

df
Out[705]: 
        colA       colB           total_str
0  [a, b, c]  [a, b, c]  [a, b, c, a, b, c]
1  [a, b, c]        NaN           [a, b, c]
2  [a, b, c]     [d, e]     [a, b, c, d, e]

如果DF中還有其他列，則可以使用：

df['total_str'] = df.applymap(lambda x: [] if x is np.nan else x).apply(lambda x: x.colA+x.colB, axis=1)

Answer 2

chain為您做這個技巧。

itertools.chain(*filter(bool, [colA, colB]))

這將返回一個迭代器，如果需要，您可以使用list結果來獲取列表，例如

import itertools

def test(colA, colB):
    total_str = itertools.chain(*filter(bool, [colA, colB]))
    print list(total_str)


test(['a', 'b'], ['c'])  # output: ['a', 'b', 'c']
test(['a', 'b', 'd'], None)  # output: ['a', 'b', 'c']
test(['a', 'b', 'd'], ['x', 'y', 'z'])  # ['a', 'b', 'd', 'x', 'y', 'z']
test(None, None)  # output []

Answer 3

我假設您要在數據numpy.nan處理numpy.nan和None 。 您可以簡單地編寫一個輔助函數，以在創建新列時將它們替換為空列表。 這不是干凈的，但可以。

def helper(x):
    return x if x is not np.nan and x is not None else []

dataframe['total_str'] = dataframe['colA'].map(helper) + dataframe['colB'].map(helper)

Answer 4

使用combine_first將NaN替換為空list以實現更快的解決方案：

df['total_str'] = df['colA'] + df['colB'].combine_first(pd.Series([[]], index=df.index))
print (df)
        colA       colB           total_str
0  [a, b, c]  [a, b, c]  [a, b, c, a, b, c]
1  [a, b, c]        NaN           [a, b, c]
2  [a, b, c]     [d, e]     [a, b, c, d, e]

df['total_str'] = df['colA'].add(df['colB'].combine_first(pd.Series([[]], index=df.index)))
print (df)
        colA       colB           total_str
0  [a, b, c]  [a, b, c]  [a, b, c, a, b, c]
1  [a, b, c]        NaN           [a, b, c]
2  [a, b, c]     [d, e]     [a, b, c, d, e]

時間：

df = pd.DataFrame({'colA': [['a','b','c']] * 3,  'colB':[['a','b','c'], np.nan, ['d','e']]})
#[30000 rows x 2 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
#print (df)

In [62]: %timeit df['total_str'] = df['colA'].combine_first(pd.Series([[]], index=df.index)) + df['colB'].combine_first(pd.Series([[]], index=df.index))
100 loops, best of 3: 8.1 ms per loop

In [63]: %timeit df['total_str1'] = df['colA'].fillna(pd.Series([[]], index=df.index)) + df['colB'].fillna(pd.Series([[]], index=df.index))
100 loops, best of 3: 9.1 ms per loop

In [64]: %timeit df['total_str2'] = df.applymap(lambda x: [] if x is np.nan else x).apply(lambda x: x.colA+x.colB, axis=1)
1 loop, best of 3: 960 ms per loop

Answer 5

您可以像這樣在熊貓中添加列：

dataframe['total_str'] = dataframe['colA'] + dataframe['colB']

如何在python中組合多個字符串列列表？

問題描述

5 個解決方案

解決方案1
2 2017-05-17 04:08:14

解決方案2
1 2017-05-17 03:50:17

解決方案3
0 2017-05-17 04:02:45

解決方案4
0 2017-05-17 05:31:50

解決方案5
-1 2017-05-17 03:41:38

如何在python中組合多個字符串列列表？

問題描述

5 個解決方案

解決方案1 2 2017-05-17 04:08:14

解決方案2 1 2017-05-17 03:50:17

解決方案3 0 2017-05-17 04:02:45

解決方案4 0 2017-05-17 05:31:50

解決方案5 -1 2017-05-17 03:41:38

解決方案1
2 2017-05-17 04:08:14

解決方案2
1 2017-05-17 03:50:17

解決方案3
0 2017-05-17 04:02:45

解決方案4
0 2017-05-17 05:31:50

解決方案5
-1 2017-05-17 03:41:38