如何将函数应用于pandas数据框中列的每个值？

Question

I had tried doing a somewhat manual approach using a loop like below: 我曾尝试使用如下所示的循环进行某种程度的手动处理：

data = pd.read_csv('data/training.csv')
for idx,imageString in enumerate(data.iloc[:,-1]):
    # print(imageString[0:10])
    data[idx,-1] = imageString.split(" ")

But this errors out on the last line with: 但这在最后一行出现以下错误：

ValueError: Length of values does not match length of index ValueError：值的长度与索引的长度不匹配

So my questions are: 所以我的问题是：

Can anyone explain why I am getting the above error and how can I get around it? 谁能解释为什么我遇到上述错误，以及如何解决该错误？
Is this the proper way to apply a split to every value in the last column of my data frame? 这是将split应用于数据帧最后一列中每个值的正确方法吗？

Regarding #2 - I saw some people using applymap but I think this creates a new column, I really just want to replace the value in the existing column with another list. 关于＃2-我看到有些人使用applymap但是我认为这会创建一个新列，我真的只想用另一个列表替换现有列中的值。

Answer 1

I think you need str.split : 我认为你需要str.split ：

data = pd.read_csv('data/training.csv')
data.iloc[:,-1] = data.iloc[:,-1].str.split(expand=False)

Then select first or some another elements of lists with str[1] or str[n] : 然后使用str[1]或str[n]选择列表的第一个或其他元素：

data.iloc[:,-1] = data.iloc[:,-1].str.split(expand=False).str[0]
data.iloc[:,-1] = data.iloc[:,-1].str.split(expand=False).str[n]

Sample: 样品：

import pandas as pd

data = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':['aa aa','ss uu','ee tt']})

print (data)
   A  B  C  D  E      F
0  1  4  7  1  5  aa aa
1  2  5  8  3  3  ss uu
2  3  6  9  5  6  ee tt

print (data.iloc[:,-1].str.split(expand=False))
0    [aa, aa]
1    [ss, uu]
2    [ee, tt]
Name: F, dtype: object

data.iloc[:,-1] = data.iloc[:,-1].str.split(expand=False).str[0]
print (data)
   A  B  C  D  E   F
0  1  4  7  1  5  aa
1  2  5  8  3  3  ss
2  3  6  9  5  6  ee

data.iloc[:,-1] = data.iloc[:,-1].str.split(expand=False).str[1]
print (data)
   A  B  C  D  E   F
0  1  4  7  1  5  aa
1  2  5  8  3  3  uu
2  3  6  9  5  6  tt

Can anyone explain why I am getting the above error and how can I get around it? 谁能解释为什么我遇到上述错误，以及如何解决该错误？

Problem is imageString.split(" ") return list and if assign to data[idx,-1] , length of elements of strings is less as length of all DataFrame. 问题是imageString.split(" ")返回list ，如果将其分配给data[idx,-1] ，则字符串元素的长度小于所有DataFrame的长度。

Is this the proper way to apply a split to every value in the last column of my data frame? 这是将拆分应用于数据帧最后一列中每个值的正确方法吗？

Better is use string methods, see pandas documentation . 更好的方法是使用字符串方法，请参见pandas文档。

Answer 2

You are not accessing the values correctly. 您没有正确访问这些值。

To correct your code, the last line should be: 要更正您的代码，最后一行应为：

df.iat[idx, -1] = imageString.split(" ")

iat is used for scalar getting and setting. iat用于标量获取和设置。

This is probably a simpler way to accomplish your objective: 这可能是实现目标的一种更简单的方法：

df.iloc[:, -1] = df.iloc[:, -1].str.split()

如何将函数应用于pandas数据框中列的每个值？

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-07-25 04:58:30

解决方案2
0 2016-07-25 05:27:22

如何将函数应用于pandas数据框中列的每个值？

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-07-25 04:58:30

解决方案2 0 2016-07-25 05:27:22

解决方案1
2 已采纳 2016-07-25 04:58:30

解决方案2
0 2016-07-25 05:27:22