简体   繁体   English

从表中获取数据的漂亮汤

[英]beautiful soup to grab data from table

I had recently asked for help using beautiful soup to grab forex prices from a site.我最近寻求帮助使用美丽的汤从网站上获取外汇价格。 the data was hidden in the span.数据隐藏在跨度中。 I was lucky enough to get help from two people who were amazing and helped me work through it.我很幸运得到了两个很棒的人的帮助,他们帮助我度过了难关。 I have since found a different site that i want to scrape from, this time there is no span the text is in tr and td from the table.从那以后,我找到了一个我想从中抓取的不同网站,这次没有跨度,文本在表中的 tr 和 td 中。

https://www.wsj.com/market-data/quotes/fx/AUDNZD/historical-prices https://www.wsj.com/market-data/quotes/fx/AUDNZD/historical-prices

is the website.. as you can see the high and low prices go back i believe 30 days on this table是网站..正如你所看到的高价和低价 go 我相信这张表上有 30 天

i would like to grab the whole table so i can use the data as needed for different calculations我想抓取整个表格,以便我可以根据需要使用数据进行不同的计算

when i attempt to grab the data its still just coming back as an empty list.. and i have tried alot of different places to grab it from.当我试图获取数据时,它仍然只是作为一个空列表返回。我已经尝试了很多不同的地方来获取它。

Can someone not only help me get what i want but explain what im doing wrong so i can learn to use the beautiful soup for myself so i dont have to keep asking for help.有人不仅可以帮助我得到我想要的东西,还可以解释我做错了什么,这样我就可以学会为自己使用美丽的汤,这样我就不必一直寻求帮助。

last time i grabbed from span it saved it in a list of lists that i was able to use and save as variables for differnt days and then do calculations with it.上次我从 span 中抓取它时,它把它保存在一个列表列表中,我可以使用这些列表并将其保存为不同日期的变量,然后用它进行计算。 this is what i am attempting to do again.这就是我再次尝试做的事情。

'''import requests from bs4 import BeautifulSoup import re '''从 bs4 import BeautifulSoup import re 导入请求

result = []结果 = []

URL = "https://www.wsj.com/market-data/quotes/fx/AUDNZD/historical-prices" page = requests.get(URL) URL = "https://www.wsj.com/market-data/quotes/fx/AUDNZD/historical-prices" 页面 = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser") soup = BeautifulSoup(page.content, "html.parser")

table = soup.select('cr_dataTable') print(table)''' table = soup.select('cr_dataTable') print(table)'''

i did not save all my attempts at different ways i tried.. i literally got down to this super basic attempt to just try to get a response back from somewhere that im grabbing so i could then continue into breaking it down to just the text.. everything i put in that soup.select() came back empty list.. so i kinda just got to a point where i decided i must not be doing any of this right.我没有保存我尝试过的不同方式的所有尝试.. 我从字面上开始进行这种超级基本的尝试,只是试图从我正在抓取的某个地方获得回复,这样我就可以继续将其分解为文本。 . 我放在那个 soup.select() 中的所有内容都返回空列表.. 所以我有点到了我决定我不能做任何这件事的地步。 the soup is grabbing the html though.汤正在抓住 html。 my find_all and find() and soup.select.. nothing seemed to work or get a repsonse back.我的 find_all 和 find() 以及 soup.select.. 似乎没有任何效果或得到回复。
please advise on how i am going about this wrong.. this simple code here should come back with lots of data for all the code in the table correct.. then i can go through it to grab text and grab what i want??请告诉我我是如何解决这个错误的。这个简单的代码应该返回表中所有正确代码的大量数据。然后我可以通过它 go 来获取文本并获取我想要的内容?

这是我要捕获的表数据的位置

'''import requests from bs4 import BeautifulSoup import re '''从 bs4 import BeautifulSoup import re 导入请求

result = []结果 = []

URL = "https://www.wsj.com/market-data/quotes/fx/AUDNZD/historical-prices" page = requests.get(URL) URL = "https://www.wsj.com/market-data/quotes/fx/AUDNZD/historical-prices" 页面 = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser") soup = BeautifulSoup(page.content, "html.parser")

table = soup.find('table', class_='cr_dataTable') print(table)''' table = soup.find('table', class_='cr_dataTable') print(table)'''

comes back none!没有回来!

You hadn't added headers thus the request was fetching output for robots.您没有添加标头,因此请求正在为机器人获取 output。

Full Code import requests from bs4 import BeautifulSoup import json import os完整代码导入请求来自 bs4 import BeautifulSoup import json import os

result = []
headers = {
    'user-agent':
    'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Mobile Safari/537.36',
}
r = URL = "https://www.wsj.com/market-data/quotes/fx/AUDNZD/historical-prices"
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
div = soup.find('div', {"id": "historical_data_table"})
table = div.find('table', {"class": "cr_dataTable"})
for i in table.findAll("tr"):
    row = i
    row = row.findAll("td")
    DATE = row[0].text
    OPEN = row[1].text
    HIGH = row[2].text
    LOW = row[3].text
    CLOSE = row[4].text
    output = {"DATE": DATE, "OPEN": OPEN,
            "HIGH": HIGH, "LOW": LOW, "CLOSE": CLOSE}
    result.append(output)
if (os.path.exists("Data.json") == False):
    f = open("Data.json", "w")
    json.dump(result, f, indent=4)
else:
    with open('Data.json', 'w') as f:
        json.dump(result, f, indent=4)

Output Output

[
    {
        "DATE": "12/02/22",
        "OPEN": "1.0691",
        "HIGH": "1.0709",
        "LOW": "1.0568",
        "CLOSE": "1.0602"
    },
    {
        "DATE": "12/01/22",
        "OPEN": "1.0768",
        "HIGH": "1.0792",
        "LOW": "1.0669",
        "CLOSE": "1.0692"
    },
    {
        "DATE": "11/30/22",
        "OPEN": "1.0787",
        "HIGH": "1.0813",
        "LOW": "1.0737",
        "CLOSE": "1.0783"
    },
    {
        "DATE": "11/29/22",
        "OPEN": "1.0794",
        "HIGH": "1.0820",
        "LOW": "1.0773",
        "CLOSE": "1.0788"
    },
    {
        "DATE": "11/28/22",
        "OPEN": "1.0807",
        "HIGH": "1.0815",
        "LOW": "1.0752",
        "CLOSE": "1.0792"
    },
    {
        "DATE": "11/25/22",
        "OPEN": "1.0805",
        "HIGH": "1.0822",
        "LOW": "1.0782",
        "CLOSE": "1.0804"
    },
    {
        "DATE": "11/24/22",
        "OPEN": "1.0787",
        "HIGH": "1.0819",
        "LOW": "1.0765",
        "CLOSE": "1.0797"
    },
    {
        "DATE": "11/23/22",
        "OPEN": "1.0801",
        "HIGH": "1.0837",
        "LOW": "1.0747",
        "CLOSE": "1.0781"
    },
    {
        "DATE": "11/22/22",
        "OPEN": "1.0826",
        "HIGH": "1.0838",
        "LOW": "1.0781",
        "CLOSE": "1.0804"
    },
    {
        "DATE": "11/21/22",
        "OPEN": "1.0891",
        "HIGH": "1.0891",
        "LOW": "1.0799",
        "CLOSE": "1.0828"
    },
    {
        "DATE": "11/18/22",
        "OPEN": "1.0915",
        "HIGH": "1.0934",
        "LOW": "1.0833",
        "CLOSE": "1.0849"
    },
    {
        "DATE": "11/17/22",
        "OPEN": "1.0964",
        "HIGH": "1.0981",
        "LOW": "1.0912",
        "CLOSE": "1.0917"
    },
    {
        "DATE": "11/16/22",
        "OPEN": "1.0971",
        "HIGH": "1.0997",
        "LOW": "1.0941",
        "CLOSE": "1.0963"
    },
    {
        "DATE": "11/15/22",
        "OPEN": "1.0995",
        "HIGH": "1.1002",
        "LOW": "1.0946",
        "CLOSE": "1.0975"
    },
    {
        "DATE": "11/14/22",
        "OPEN": "1.0957",
        "HIGH": "1.1015",
        "LOW": "1.0953",
        "CLOSE": "1.0994"
    },
    {
        "DATE": "11/11/22",
        "OPEN": "1.0987",
        "HIGH": "1.1046",
        "LOW": "1.0949",
        "CLOSE": "1.0965"
    },
    {
        "DATE": "11/10/22",
        "OPEN": "1.0927",
        "HIGH": "1.0992",
        "LOW": "1.0913",
        "CLOSE": "1.0986"
    },
    {
        "DATE": "11/09/22",
        "OPEN": "1.0927",
        "HIGH": "1.0975",
        "LOW": "1.0901",
        "CLOSE": "1.0929"
    },
    {
        "DATE": "11/08/22",
        "OPEN": "1.0908",
        "HIGH": "1.0928",
        "LOW": "1.0882",
        "CLOSE": "1.0919"
    },
    {
        "DATE": "11/07/22",
        "OPEN": "1.0863",
        "HIGH": "1.0977",
        "LOW": "1.0863",
        "CLOSE": "1.0910"
    },
    {
        "DATE": "11/04/22",
        "OPEN": "1.0896",
        "HIGH": "1.0960",
        "LOW": "1.0877",
        "CLOSE": "1.0909"
    },
    {
        "DATE": "11/03/22",
        "OPEN": "1.0914",
        "HIGH": "1.0937",
        "LOW": "1.0883",
        "CLOSE": "1.0898"
    },
    {
        "DATE": "11/02/22",
        "OPEN": "1.0945",
        "HIGH": "1.0957",
        "LOW": "1.0902",
        "CLOSE": "1.0913"
    },
    {
        "DATE": "11/01/22",
        "OPEN": "1.1003",
        "HIGH": "1.1033",
        "LOW": "1.0930",
        "CLOSE": "1.0944"
    },
    {
        "DATE": "10/31/22",
        "OPEN": "1.1031",
        "HIGH": "1.1348",
        "LOW": "1.0989",
        "CLOSE": "1.1004"
    },
    {
        "DATE": "10/28/22",
        "OPEN": "1.1070",
        "HIGH": "1.1084",
        "LOW": "1.1012",
        "CLOSE": "1.1032"
    },
    {
        "DATE": "10/27/22",
        "OPEN": "1.1140",
        "HIGH": "1.1154",
        "LOW": "1.1058",
        "CLOSE": "1.1072"
    },
    {
        "DATE": "10/26/22",
        "OPEN": "1.1130",
        "HIGH": "1.1176",
        "LOW": "1.1092",
        "CLOSE": "1.1133"
    },
    {
        "DATE": "10/25/22",
        "OPEN": "1.1089",
        "HIGH": "1.1122",
        "LOW": "1.1065",
        "CLOSE": "1.1111"
    },
    {
        "DATE": "10/24/22",
        "OPEN": "1.1124",
        "HIGH": "1.1124",
        "LOW": "1.1020",
        "CLOSE": "1.1085"
    },
    {
        "DATE": "10/21/22",
        "OPEN": "1.1063",
        "HIGH": "1.1102",
        "LOW": "1.1044",
        "CLOSE": "1.1077"
    },
    {
        "DATE": "10/20/22",
        "OPEN": "1.1056",
        "HIGH": "1.1094",
        "LOW": "1.1023",
        "CLOSE": "1.1062"
    },
    {
        "DATE": "10/19/22",
        "OPEN": "1.1100",
        "HIGH": "1.1107",
        "LOW": "1.1052",
        "CLOSE": "1.1055"
    },
    {
        "DATE": "10/18/22",
        "OPEN": "1.1151",
        "HIGH": "1.1210",
        "LOW": "1.1071",
        "CLOSE": "1.1101"
    },
    {
        "DATE": "10/17/22",
        "OPEN": "1.1138",
        "HIGH": "1.1193",
        "LOW": "1.1137",
        "CLOSE": "1.1161"
    },
    {
        "DATE": "10/14/22",
        "OPEN": "1.1176",
        "HIGH": "1.1191",
        "LOW": "1.1121",
        "CLOSE": "1.1151"
    },
    {
        "DATE": "10/13/22",
        "OPEN": "1.1192",
        "HIGH": "1.1215",
        "LOW": "1.1157",
        "CLOSE": "1.1163"
    },
    {
        "DATE": "10/12/22",
        "OPEN": "1.1235",
        "HIGH": "1.1244",
        "LOW": "1.1172",
        "CLOSE": "1.1188"
    },
    {
        "DATE": "10/11/22",
        "OPEN": "1.1318",
        "HIGH": "1.1328",
        "LOW": "1.1195",
        "CLOSE": "1.1237"
    },
    {
        "DATE": "10/10/22",
        "OPEN": "1.1367",
        "HIGH": "1.1370",
        "LOW": "1.1266",
        "CLOSE": "1.1317"
    },
    {
        "DATE": "10/07/22",
        "OPEN": "1.1322",
        "HIGH": "1.1376",
        "LOW": "1.1301",
        "CLOSE": "1.1358"
    },
    {
        "DATE": "10/06/22",
        "OPEN": "1.1309",
        "HIGH": "1.1355",
        "LOW": "1.1244",
        "CLOSE": "1.1327"
    },
    {
        "DATE": "10/05/22",
        "OPEN": "1.1348",
        "HIGH": "1.1381",
        "LOW": "1.1242",
        "CLOSE": "1.1308"
    },
    {
        "DATE": "10/04/22",
        "OPEN": "1.1386",
        "HIGH": "1.1426",
        "LOW": "1.1306",
        "CLOSE": "1.1349"
    },
    {
        "DATE": "10/03/22",
        "OPEN": "1.1460",
        "HIGH": "1.1460",
        "LOW": "1.1362",
        "CLOSE": "1.1388"
    },
    {
        "DATE": "09/30/22",
        "OPEN": "1.1387",
        "HIGH": "1.1444",
        "LOW": "1.1320",
        "CLOSE": "1.1439"
    },
    {
        "DATE": "09/29/22",
        "OPEN": "1.1382",
        "HIGH": "1.1417",
        "LOW": "1.1346",
        "CLOSE": "1.1350"
    },
    {
        "DATE": "09/28/22",
        "OPEN": "1.1417",
        "HIGH": "1.1495",
        "LOW": "1.1290",
        "CLOSE": "1.1385"
    },
    {
        "DATE": "09/27/22",
        "OPEN": "1.1453",
        "HIGH": "1.1466",
        "LOW": "1.1370",
        "CLOSE": "1.1419"
    },
    {
        "DATE": "09/26/22",
        "OPEN": "1.1365",
        "HIGH": "1.1465",
        "LOW": "1.1328",
        "CLOSE": "1.1454"
    },
    {
        "DATE": "09/23/22",
        "OPEN": "1.1365",
        "HIGH": "1.1378",
        "LOW": "1.1323",
        "CLOSE": "1.1373"
    },
    {
        "DATE": "09/22/22",
        "OPEN": "1.1329",
        "HIGH": "1.1373",
        "LOW": "1.1303",
        "CLOSE": "1.1366"
    },
    {
        "DATE": "09/21/22",
        "OPEN": "1.1342",
        "HIGH": "1.1363",
        "LOW": "1.1315",
        "CLOSE": "1.1334"
    },
    {
        "DATE": "09/20/22",
        "OPEN": "1.1284",
        "HIGH": "1.1365",
        "LOW": "1.1273",
        "CLOSE": "1.1347"
    },
    {
        "DATE": "09/19/22",
        "OPEN": "1.1221",
        "HIGH": "1.1295",
        "LOW": "1.1206",
        "CLOSE": "1.1289"
    },
    {
        "DATE": "09/16/22",
        "OPEN": "1.1232",
        "HIGH": "1.1256",
        "LOW": "1.1197",
        "CLOSE": "1.1218"
    },
    {
        "DATE": "09/15/22",
        "OPEN": "1.1247",
        "HIGH": "1.1261",
        "LOW": "1.1212",
        "CLOSE": "1.1233"
    },
    {
        "DATE": "09/14/22",
        "OPEN": "1.1255",
        "HIGH": "1.1255",
        "LOW": "1.1201",
        "CLOSE": "1.1239"
    },
    {
        "DATE": "09/13/22",
        "OPEN": "1.1218",
        "HIGH": "1.1259",
        "LOW": "1.1194",
        "CLOSE": "1.1223"
    },
    {
        "DATE": "09/12/22",
        "OPEN": "1.1186",
        "HIGH": "1.1240",
        "LOW": "1.1181",
        "CLOSE": "1.1225"
    },
    {
        "DATE": "09/09/22",
        "OPEN": "1.1156",
        "HIGH": "1.1215",
        "LOW": "1.1139",
        "CLOSE": "1.1212"
    },
    {
        "DATE": "09/08/22",
        "OPEN": "1.1142",
        "HIGH": "1.1157",
        "LOW": "1.1115",
        "CLOSE": "1.1151"
    },
    {
        "DATE": "09/07/22",
        "OPEN": "1.1153",
        "HIGH": "1.1181",
        "LOW": "1.1134",
        "CLOSE": "1.1141"
    },
    {
        "DATE": "09/06/22",
        "OPEN": "1.1150",
        "HIGH": "1.1173",
        "LOW": "1.1127",
        "CLOSE": "1.1152"
    },
    {
        "DATE": "09/05/22",
        "OPEN": "1.1113",
        "HIGH": "1.1167",
        "LOW": "1.1113",
        "CLOSE": "1.1153"
    }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM