简体   繁体   English

通过for循环更新pandas数据帧

[英]updating pandas dataframe via for loops

I have a bunch of URLs stored into a data frame and I am cleaning them up via a url parsing module. 我有一堆URL存储在数据框中,我通过url解析模块清理它们。 The issue that I am having is that the 'siteClean' field that is supposed to update with the cleaned url is updating the entire column and not the individual cell... 我遇到的问题是,应该使用已清理的URL更新的“siteClean”字段是更新整个列而不是单个单元格...

Here is the code: 这是代码:

results = resultsX.copy(deep = True)
results = results.reset_index(drop = True)
results['siteClean'] = ''


from urlparse import urlsplit  
import re

for row in results.iterrows():
    #print row[1]
    url = row[1][1]
    if not re.match(r'http(s?)\:', url):
        url = 'http://' + url
    parsed = urlsplit(url)
    host = parsed.netloc
    #print host
    #row[1][1] = host
    #results[row][1] = host
    results['siteClean'] = host
    print results

In general, it's better to avoid looping over your frame's rows, if you can avoid it. 一般来说,最好避免在框架的行上循环,如果可以避免的话。 If I understand your problem correctly, you want to look at a single column from your frame, and apply a function on each element of that column. 如果我正确理解您的问题,您需要查看框架中的单个列,并对该列的每个元素应用一个函数。 Then you want to put the result of all those function calls into a column of the original frame. 然后,您希望将所有这些函数调用的结果放入原始帧的列中。 Maybe a new column, maybe in place of the old column. 也许是一个新专栏,可能代替旧专栏。 This sounds like a job for pd.Series.map . 这听起来像是pd.Series.map的工作。

import pandas as pd
import numpy as np

np.random.seed(0)

n=10

df = pd.DataFrame({'num': np.random.randn(n),
                   'lett': np.random.choice(
                        list('abcdefghijklmnopqrstuvwxyz'),n)
                   })  

df looks like this: df看起来像这样:

df原创

Set up a function to classify a single letter as either a consonant or a vowel: 设置一个函数将单个字母分类为辅音或元音:

def classify_letter(char):
    if char in list('aeiou'):
        return 'vowel'
    else:
        return 'consonant'

Then you can use map to generate a new Series whose entries are those of the input transformed by the specified function. 然后,您可以使用map生成一个新Series其条目是由指定函数转换的输入。 You can stick that new output series wherever you like. 你可以在任何你喜欢的地方坚持使用新的输出系列。 It can be a new column (in your old DataFrame or elsewhere) or it can replace the old column. 它可以是新列(在旧的DataFrame或其他位置),也可以替换旧列。 Note that map only works on a Series , so be sure to select down to one column before using it: 请注意, map仅适用于Series ,因此请务必在使用之前选择一列:

df['new'] = df['lett'].map(classify_letter)

gives: 得到:

添加col的df

while if you started from the original setup and ran: 如果你从原始设置开始并运行:

df['lett'] = df['lett'].map(classify_letter)

then you would replace the old column with the new one: 那么你将用新的列替换旧列:

df用col替换

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM