基于另一列追加Pandas DataFrame列

Question

I have Pandas DataFrame that looks like this: 我有看起来像这样的Pandas DataFrame：

| Index | Value        |
|-------|--------------|
| 1     | [1, 12, 123] |
| 2     | [12, 123, 1] |
| 3     | [123, 12, 1] |

and I want to append third column with list of array elements lengths : 我想在第三列后面附加数组元素长度列表 ：

| Index | Value        | Expected_value |
|-------|--------------|----------------|
| 1     | [1, 12, 123] | [1, 2, 3]      |
| 2     | [12, 123, 1] | [2, 3, 1]      |
| 3     | [123, 12, 1] | [3, 2, 1]      |

I've tried to use python lambda function and mapping little bit like this: 我试图使用python lambda函数并映射如下：

dataframe["Expected_value"] = dataframe.value.map(lambda x: len(str(x)))

but instead of list I got sum of those lengths : 但是我没有列出这些长度的总和 ：

| Index | Value        | Expected_value |
|-------|--------------|----------------|
| 1     | [1, 12, 123] | 6              |
| 2     | [12, 123, 1] | 6              |
| 3     | [123, 12, 1] | 6              |

Answer 1

You can use list comprehension with map : 您可以对map使用list comprehension ：

dataframe["Expected_value"] = dataframe.Value.map(lambda x: [len(str(y)) for y in x])

Or nested list comprehension: 或嵌套列表理解：

dataframe["Expected_value"] = [[len(str(y)) for y in x] for x in dataframe.Value]

There is also possible use alternative for get lengths of integers: 对于整数的获取，也可以使用替代方法：

import math
dataframe["Expected_value"] = [[int(math.log10(y))+1 for y in x] for x in dataframe.Value]

print (dataframe)
   Index         Value Expected_value
0      1  [1, 12, 123]      [1, 2, 3]
1      2  [12, 123, 1]      [2, 3, 1]
2      3  [123, 12, 1]      [3, 2, 1]

Answer 2

Use a list comprehension: 使用列表理解：

[[len(str(y)) for y in x] for x in df['Value'].tolist()]
# [[1, 2, 3], [2, 3, 1], [3, 2, 1]]

df['Expected_value'] = [[len(str(y)) for y in x] for x in df['Value'].tolist()]
df

   Index         Value Expected_value
0      1  [1, 12, 123]      [1, 2, 3]
1      2  [12, 123, 1]      [2, 3, 1]
2      3  [123, 12, 1]      [3, 2, 1]

If you need to handle missing data, 如果您需要处理丢失的数据，

def foo(x):
    try:
       return [len(str(y)) for y in x]
    except TypeError:
        return np.nan

df['Expected_value'] = [foo(x) for x in df['Value'].tolist()]
df

   Index         Value Expected_value
0      1  [1, 12, 123]      [1, 2, 3]
1      2  [12, 123, 1]      [2, 3, 1]
2      3  [123, 12, 1]      [3, 2, 1]

It is probably the best in terms of performance when dealing with object type data. 在处理对象类型数据时，就性能而言，这可能是最好的。 More reading at For loops with pandas - When should I care? 有关For循环与熊猫的更多阅读-我何时应该关心？ . 。

Another solution with pd.DataFrame , applymap and agg : 使用pd.DataFrame ， applymap和agg另一个解决方案：

pd.DataFrame(df['Value'].tolist()).astype(str).applymap(len).agg(list, axis=1)

0    [1, 2, 3]
1    [2, 3, 1]
2    [3, 2, 1]
dtype: object

基于另一列追加Pandas DataFrame列

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-04-13 19:15:02

解决方案2
1 2019-04-13 19:16:55

基于另一列追加Pandas DataFrame列

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-04-13 19:15:02

解决方案2 1 2019-04-13 19:16:55

解决方案1
3 已采纳 2019-04-13 19:15:02

解决方案2
1 2019-04-13 19:16:55