[英]Web scraping into a dataframe. Getting error when adding a column
My error: "list indices must be integers or slices, not list" 我的错误:“列表索引必须是整数或切片,而不是列表”
I know that there are seemingly endless amounts of posts related to that error, but I've searched and can't figure it out. 我知道与该错误相关的帖子似乎不胜枚举,但我进行了搜索,无法弄清楚。 If there is a solution that I missed and will help me, please let me know.
如果有我错过的解决方案可以为我提供帮助,请告诉我。
Anyways... 无论如何...
I'm using pandas to web scrape stock information into a dataframe, then add two calculated columns to the end: 我正在使用熊猫将网页上的股票信息抓取到一个数据框中,然后在最后添加两个计算列:
The issues I'm seeing is when I try to add the first calculated column (the last bit of code): 我看到的问题是当我尝试添加第一个计算列(代码的最后一部分)时:
# Dependencies
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = 'http://www.dividend.com/dividend-stocks/preferred-dividend-stocks.php#stocks&sort_name=Symbol&sort_order=ASC&page=1'
tables = pd.read_html(url)
tables
type(tables)
type(tables[0])
tables[0].head()
tables['Perp Value'] = (1/(tables['Dividend Yield']/100))*tables['Annual Dividend']
tables[0].head()
When I try to add the column 'Perp Value' I get the error. 当我尝试添加“ Perp Value”列时,出现错误。 What do I need to add to be able to do my calc?
我需要添加什么才能进行计算?
For reference, the unformatted data looks like: 供参考,未格式化的数据如下所示:
\\nDividend Yield\\n \\nCurrent Price\\n \\nAnnual Dividend\\n \\n52-Week High\\n \\ 0 8.04% $25.65 $2.06 25.98
\\ n股息收益率\\ n \\ n当前价格\\ n \\ n年度股息\\ n \\ n52周最高\\ n \\ 0 8.04%$ 25.65 $ 2.06 25.98
1 7.61% $25.47 $1.94 25.951 7.61%$ 25.47 $ 1.94 25.95
2 6.66% $25.82 $1.72 28.802 6.66%$ 25.82 $ 1.72 28.80
3 7.47% $25.95 $1.94 26.873 7.47%$ 25.95 $ 1.94 26.87
4 5.78% $25.72 $1.49 28.994 5.78%$ 25.72 $ 1.49 28.99
5 8.06% $26.20 $2.11 26.005 8.06%$ 26.20 $ 2.11 26.00
6 7.72% $23.87 $1.84 0.006 7.72%$ 23.87 $ 1.84 0.00
7 7.75% $23.80 $1.84 0.007 7.75%$ 23.80 $ 1.84 0.00
8 7.80% $24.05 $1.88 0.008 7.80%$ 24.05 $ 1.88 0.00
99
Find column names: 查找列名:
list(tables[0])
['\nStock Symbol\n',
'\nCompany Name\n',
'\nDividend Yield\n',
'\nCurrent Price\n',
'\nAnnual Dividend\n',
'\n52-Week High\n',
'\n52-Week Low\n']
Clean data and convert to numeric: 清除数据并转换为数字:
a = tables[0][list(tables[0])[4]]
a = a.replace('[\$,]', '', regex=True).astype(float)
b = tables[0][list(tables[0])[2]]
b = b.replace('[\%,]', '', regex=True).astype(float)
Output: 输出:
(1/a/100)*b
0 0.039029
1 0.039227
2 0.038721
3 0.038505
4 0.038792
5 0.038199
6 0.041957
7 0.042120
8 0.041489
9 0.041649
10 0.038594
11 0.039053
12 0.038795
13 0.037853
14 0.037546
15 0.039100
16 0.039320
17 0.039898
18 0.038431
19 0.040290
dtype: float64
Assign: 分配:
tables[0]["Perp Value"] = (1/a/100)*b
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.