[英]How to apply a function to every value in a column in a pandas dataframe?
I had tried doing a somewhat manual approach using a loop like below: 我曾尝试使用如下所示的循环进行某种程度的手动处理:
data = pd.read_csv('data/training.csv')
for idx,imageString in enumerate(data.iloc[:,-1]):
# print(imageString[0:10])
data[idx,-1] = imageString.split(" ")
But this errors out on the last line with: 但这在最后一行出现以下错误:
ValueError: Length of values does not match length of index
ValueError:值的长度与索引的长度不匹配
So my questions are: 所以我的问题是:
split
to every value in the last column of my data frame? split
应用于数据帧最后一列中每个值的正确方法吗? Regarding #2 - I saw some people using applymap
but I think this creates a new column, I really just want to replace the value in the existing column with another list. 关于#2-我看到有些人使用
applymap
但是我认为这会创建一个新列,我真的只想用另一个列表替换现有列中的值。
I think you need str.split
: 我认为你需要
str.split
:
data = pd.read_csv('data/training.csv')
data.iloc[:,-1] = data.iloc[:,-1].str.split(expand=False)
Then select first or some another elements of lists with str[1]
or str[n]
: 然后使用
str[1]
或str[n]
选择列表的第一个或其他元素:
data.iloc[:,-1] = data.iloc[:,-1].str.split(expand=False).str[0]
data.iloc[:,-1] = data.iloc[:,-1].str.split(expand=False).str[n]
Sample: 样品:
import pandas as pd
data = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':['aa aa','ss uu','ee tt']})
print (data)
A B C D E F
0 1 4 7 1 5 aa aa
1 2 5 8 3 3 ss uu
2 3 6 9 5 6 ee tt
print (data.iloc[:,-1].str.split(expand=False))
0 [aa, aa]
1 [ss, uu]
2 [ee, tt]
Name: F, dtype: object
data.iloc[:,-1] = data.iloc[:,-1].str.split(expand=False).str[0]
print (data)
A B C D E F
0 1 4 7 1 5 aa
1 2 5 8 3 3 ss
2 3 6 9 5 6 ee
data.iloc[:,-1] = data.iloc[:,-1].str.split(expand=False).str[1]
print (data)
A B C D E F
0 1 4 7 1 5 aa
1 2 5 8 3 3 uu
2 3 6 9 5 6 tt
Can anyone explain why I am getting the above error and how can I get around it?
谁能解释为什么我遇到上述错误,以及如何解决该错误?
Problem is imageString.split(" ")
return list
and if assign to data[idx,-1]
, length of elements of strings is less as length of all DataFrame. 问题是
imageString.split(" ")
返回list
,如果将其分配给data[idx,-1]
,则字符串元素的长度小于所有DataFrame的长度。
Is this the proper way to apply a split to every value in the last column of my data frame?
这是将拆分应用于数据帧最后一列中每个值的正确方法吗?
Better is use string methods, see pandas documentation . 更好的方法是使用字符串方法,请参见pandas文档 。
You are not accessing the values correctly. 您没有正确访问这些值。
To correct your code, the last line should be: 要更正您的代码,最后一行应为:
df.iat[idx, -1] = imageString.split(" ")
iat
is used for scalar getting and setting. iat
用于标量获取和设置。
This is probably a simpler way to accomplish your objective: 这可能是实现目标的一种更简单的方法:
df.iloc[:, -1] = df.iloc[:, -1].str.split()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.