简体   繁体   English

Web抓取到数据框。 添加列时出现错误

[英]Web scraping into a dataframe. Getting error when adding a column

My error: "list indices must be integers or slices, not list" 我的错误:“列表索引必须是整数或切片,而不是列表”

I know that there are seemingly endless amounts of posts related to that error, but I've searched and can't figure it out. 我知道与该错误相关的帖子似乎不胜枚举,但我进行了搜索,无法弄清楚。 If there is a solution that I missed and will help me, please let me know. 如果有我错过的解决方案可以为我提供帮助,请告诉我。

Anyways... 无论如何...

I'm using pandas to web scrape stock information into a dataframe, then add two calculated columns to the end: 我正在使用熊猫将网页上的股票信息抓取到一个数据框中,然后在最后添加两个计算列:

  1. to calculate the price 计算价格
  2. another unrelated calc 另一个不相关的计算

The issues I'm seeing is when I try to add the first calculated column (the last bit of code): 我看到的问题是当我尝试添加第一个计算列(代码的最后一部分)时:

# Dependencies
from bs4 import BeautifulSoup
import requests
import pandas as pd

url = 'http://www.dividend.com/dividend-stocks/preferred-dividend-stocks.php#stocks&sort_name=Symbol&sort_order=ASC&page=1'
tables = pd.read_html(url)
tables

type(tables)
type(tables[0])
tables[0].head()

tables['Perp Value'] = (1/(tables['Dividend Yield']/100))*tables['Annual Dividend']
tables[0].head()

When I try to add the column 'Perp Value' I get the error. 当我尝试添加“ Perp Value”列时,出现错误。 What do I need to add to be able to do my calc? 我需要添加什么才能进行计算?

For reference, the unformatted data looks like: 供参考,未格式化的数据如下所示:

\\nDividend Yield\\n \\nCurrent Price\\n \\nAnnual Dividend\\n \\n52-Week High\\n \\ 0 8.04% $25.65 $2.06 25.98 \\ n股息收益率\\ n \\ n当前价格\\ n \\ n年度股息\\ n \\ n52周最高\\ n \\ 0 8.04%$ 25.65 $ 2.06 25.98
1 7.61% $25.47 $1.94 25.95 1 7.61%$ 25.47 $ 1.94 25.95
2 6.66% $25.82 $1.72 28.80 2 6.66%$ 25.82 $ 1.72 28.80
3 7.47% $25.95 $1.94 26.87 3 7.47%$ 25.95 $ 1.94 26.87
4 5.78% $25.72 $1.49 28.99 4 5.78%$ 25.72 $ 1.49 28.99
5 8.06% $26.20 $2.11 26.00 5 8.06%$ 26.20 $ 2.11 26.00
6 7.72% $23.87 $1.84 0.00 6 7.72%$ 23.87 $ 1.84 0.00
7 7.75% $23.80 $1.84 0.00 7 7.75%$ 23.80 $ 1.84 0.00
8 7.80% $24.05 $1.88 0.00 8 7.80%$ 24.05 $ 1.88 0.00
9 9

Try this: 尝试这个:

Find column names: 查找列名:

list(tables[0])

['\nStock Symbol\n',
'\nCompany Name\n',
'\nDividend Yield\n',
'\nCurrent Price\n',
'\nAnnual Dividend\n',
'\n52-Week High\n',
'\n52-Week Low\n']

Clean data and convert to numeric: 清除数据并转换为数字:

a = tables[0][list(tables[0])[4]]
a = a.replace('[\$,]', '', regex=True).astype(float)
b = tables[0][list(tables[0])[2]]
b = b.replace('[\%,]', '', regex=True).astype(float)

Output: 输出:

(1/a/100)*b

0     0.039029
1     0.039227
2     0.038721
3     0.038505
4     0.038792
5     0.038199
6     0.041957
7     0.042120
8     0.041489
9     0.041649
10    0.038594
11    0.039053
12    0.038795
13    0.037853
14    0.037546
15    0.039100
16    0.039320
17    0.039898
18    0.038431
19    0.040290
dtype: float64

Assign: 分配:

tables[0]["Perp Value"] = (1/a/100)*b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM