熊猫DataFrame速度

Question

在此处输入图片说明

So I have the preceding dataframe to which I want to add a new column called "dload" which I achieve by coding df["dload"] = np.nan 因此，我要在前面的数据帧中添加一个名为“ dload”的新列，该列是通过编码df [“ dload”] = np.nan来实现的

I then want to fill in the nan value with the returns of this function: 然后，我想用此函数的返回值来填充nan值：

def func_ret_value(soup,tables):
    for td in tables[40].findAll("td"):
       if td.text == "Short Percent of Float":
          value = list(td.next_siblings)[1].text.strip("%")
        #print(value)
    return value

To do this I write the following code: 为此，我编写了以下代码：

for index in df.index:
#     print(index,row)
#     print(index,df.iloc[index]["Symbol"])
   r = requests.get(url_pre+df.iloc[index]["Symbol"]+url_suf)
   soup = BeautifulSoup(r.text,"html.parser")
   tables = soup.findAll("table")
   #print(row["dload"])
   df.loc[index,"dload"] = func_ret_value(soup,tables)

Is there some iterrows or apply that is a faster way of doing this? 是否有某些安排或应用程序是这样做的更快方法？

Thank you. 谢谢。

Answer 1

You could use apply() , but I would guess that the most computationally intensive part of your code are your HTTP requests (as mentioned by @Peter Leimbigler in his comment). 您可以使用apply() ，但是我猜想代码中计算量最大的部分是HTTP请求（如@Peter Leimbigler在其评论中提到的那样）。 Here is an example with your function: 这是您的函数的示例：

def func_ret_value(x):

    r = requests.get(url_pre + x['Symbol'] + url_suf)
    soup = BeautifulSoup(r.text, 'html.parser')
    tables = soup.findAll('table')
    for td in tables[40].findAll("td"):
       if td.text == "Short Percent of Float":
          return list(td.next_siblings)[1].text.strip("%")

df['dload'] = df.apply(func_ret_value, axis=1)

Note that axis=1 specifies that you will apply this function row-wise. 注意axis=1指定您将逐行应用此函数。

You may also consider implementing some error-handling here in the case that your if statement inside your func_ret_value() function is never triggered for a given row. if对于给定的行，您的func_ret_value()函数中的if语句永远不会被触发，您也可以考虑在此处实现一些错误处理。

熊猫DataFrame速度

问题描述

1 个解决方案

解决方案1
0 2018-10-08 20:18:40

熊猫DataFrame速度

问题描述

1 个解决方案

解决方案1 0 2018-10-08 20:18:40

解决方案1
0 2018-10-08 20:18:40