通过for循环更新pandas数据帧

Question

I have a bunch of URLs stored into a data frame and I am cleaning them up via a url parsing module. 我有一堆URL存储在数据框中，我通过url解析模块清理它们。 The issue that I am having is that the 'siteClean' field that is supposed to update with the cleaned url is updating the entire column and not the individual cell... 我遇到的问题是，应该使用已清理的URL更新的“siteClean”字段是更新整个列而不是单个单元格...

Here is the code: 这是代码：

results = resultsX.copy(deep = True)
results = results.reset_index(drop = True)
results['siteClean'] = ''


from urlparse import urlsplit  
import re

for row in results.iterrows():
    #print row[1]
    url = row[1][1]
    if not re.match(r'http(s?)\:', url):
        url = 'http://' + url
    parsed = urlsplit(url)
    host = parsed.netloc
    #print host
    #row[1][1] = host
    #results[row][1] = host
    results['siteClean'] = host
    print results

Answer 1

In general, it's better to avoid looping over your frame's rows, if you can avoid it. 一般来说，最好避免在框架的行上循环，如果可以避免的话。 If I understand your problem correctly, you want to look at a single column from your frame, and apply a function on each element of that column. 如果我正确理解您的问题，您需要查看框架中的单个列，并对该列的每个元素应用一个函数。 Then you want to put the result of all those function calls into a column of the original frame. 然后，您希望将所有这些函数调用的结果放入原始帧的列中。 Maybe a new column, maybe in place of the old column. 也许是一个新专栏，可能代替旧专栏。 This sounds like a job for pd.Series.map . 这听起来像是pd.Series.map的工作。

import pandas as pd
import numpy as np

np.random.seed(0)

n=10

df = pd.DataFrame({'num': np.random.randn(n),
                   'lett': np.random.choice(
                        list('abcdefghijklmnopqrstuvwxyz'),n)
                   })

df looks like this: df看起来像这样：

df原创

Set up a function to classify a single letter as either a consonant or a vowel: 设置一个函数将单个字母分类为辅音或元音：

def classify_letter(char):
    if char in list('aeiou'):
        return 'vowel'
    else:
        return 'consonant'

Then you can use map to generate a new Series whose entries are those of the input transformed by the specified function. 然后，您可以使用map生成一个新Series其条目是由指定函数转换的输入。 You can stick that new output series wherever you like. 你可以在任何你喜欢的地方坚持使用新的输出系列。 It can be a new column (in your old DataFrame or elsewhere) or it can replace the old column. 它可以是新列（在旧的DataFrame或其他位置），也可以替换旧列。 Note that map only works on a Series , so be sure to select down to one column before using it: 请注意， map仅适用于Series ，因此请务必在使用之前选择一列：

df['new'] = df['lett'].map(classify_letter)

gives: 得到：

添加col的df

while if you started from the original setup and ran: 如果你从原始设置开始并运行：

df['lett'] = df['lett'].map(classify_letter)

then you would replace the old column with the new one: 那么你将用新的列替换旧列：

df用col替换

通过for循环更新pandas数据帧

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-01-09 22:03:23

通过for循环更新pandas数据帧

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-01-09 22:03:23

解决方案1
2 已采纳 2014-01-09 22:03:23