[英]updating pandas dataframe via for loops
I have a bunch of URLs stored into a data frame and I am cleaning them up via a url parsing module. 我有一堆URL存储在数据框中,我通过url解析模块清理它们。 The issue that I am having is that the 'siteClean' field that is supposed to update with the cleaned url is updating the entire column and not the individual cell... 我遇到的问题是,应该使用已清理的URL更新的“siteClean”字段是更新整个列而不是单个单元格...
Here is the code: 这是代码:
results = resultsX.copy(deep = True)
results = results.reset_index(drop = True)
results['siteClean'] = ''
from urlparse import urlsplit
import re
for row in results.iterrows():
#print row[1]
url = row[1][1]
if not re.match(r'http(s?)\:', url):
url = 'http://' + url
parsed = urlsplit(url)
host = parsed.netloc
#print host
#row[1][1] = host
#results[row][1] = host
results['siteClean'] = host
print results
In general, it's better to avoid looping over your frame's rows, if you can avoid it. 一般来说,最好避免在框架的行上循环,如果可以避免的话。 If I understand your problem correctly, you want to look at a single column from your frame, and apply a function on each element of that column. 如果我正确理解您的问题,您需要查看框架中的单个列,并对该列的每个元素应用一个函数。 Then you want to put the result of all those function calls into a column of the original frame. 然后,您希望将所有这些函数调用的结果放入原始帧的列中。 Maybe a new column, maybe in place of the old column. 也许是一个新专栏,可能代替旧专栏。 This sounds like a job for pd.Series.map
. 这听起来像是pd.Series.map
的工作。
import pandas as pd
import numpy as np
np.random.seed(0)
n=10
df = pd.DataFrame({'num': np.random.randn(n),
'lett': np.random.choice(
list('abcdefghijklmnopqrstuvwxyz'),n)
})
df
looks like this: df
看起来像这样:
Set up a function to classify a single letter as either a consonant or a vowel: 设置一个函数将单个字母分类为辅音或元音:
def classify_letter(char):
if char in list('aeiou'):
return 'vowel'
else:
return 'consonant'
Then you can use map
to generate a new Series
whose entries are those of the input transformed by the specified function. 然后,您可以使用map
生成一个新Series
其条目是由指定函数转换的输入。 You can stick that new output series wherever you like. 你可以在任何你喜欢的地方坚持使用新的输出系列。 It can be a new column (in your old DataFrame
or elsewhere) or it can replace the old column. 它可以是新列(在旧的DataFrame
或其他位置),也可以替换旧列。 Note that map
only works on a Series
, so be sure to select down to one column before using it: 请注意, map
仅适用于Series
,因此请务必在使用之前选择一列:
df['new'] = df['lett'].map(classify_letter)
gives: 得到:
while if you started from the original setup and ran: 如果你从原始设置开始并运行:
df['lett'] = df['lett'].map(classify_letter)
then you would replace the old column with the new one: 那么你将用新的列替换旧列:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.