I am trying to scrape data from a website which uses Javascript structure to load the data. I used solution to this question Issue with html tags while scraping data using beautiful soup to accomplish that. After, getting the JSON data dictionary I iterated over it to successfully get the device name and price data.
Code mentioned in the solution of above mentioned question is actually extracting data from a window having device name and price with its attribute mentioned in code as window.rates
.
Problem: If you look at the structure of website, there are 3 parts in it.
I want to extract data from the third part as I want all 4 fields(Plan name, device name, price, monthly price) . I am able to scrape data from 1st & 2nd part using solution to above mentioned question though.
Now, I am not able to find the javascript which is loading the data in 3rd part , also the attribute(Eg. window.rates for 2nd part) which I will have to use to get the JSON dictionary of data for 3rd part.
Also, data in 3rd part of website changes as we scroll the windows in 2nd part.
PS: I tried printing all the scripts running on the page to find out the script which is loading the data in 3rd part but it was not of any help.
Please help me in solving this issue.
You provided a link to your previous question that mentions the site you're interested in:
http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html
You just have to look at the code.
Say you select "Red M" as the plan and "Samsung Galaxy SIII Blau (Blue) / 16 GB. The bottom section will display:
Einmalzahlung (Onetime Payment) Smartphone: 9.90
Red M 59.99
24 x 5 Euro Smartphone-Rabatt -5.00
Also one of three 10.00/month discounts are available for being a student, young, or handicapped.
You need to parse (maybe using Python's JSON module) these JavaScript assignments:
window.phones
window.rates
window.discounts
window.goodies
window.promotions
I'll walk you through the data structures. You'll have to write the code yourself.
windows.phones
, contains this entry (keeping with our example):
window.phones = {
sku1224225:{
name:"Samsung Galaxy SIII Blau 16 GB",
image:"/images/m1057472_300599.jpg",
deliveryTime:"Lieferbar innerhalb 48 Stunden",
sku1444275:{p:"prod1334441",e:"49.90"}, // "Vodafone Red S"
sku1444283:{p:"prod1334441",e:"9.90"}, // "Vodafone Red M"
sku1444291:{p:"prod1334441",e:"9.90"}, // "Vodafone Red Premium"
sku1444286:{p:"prod1334441",e:"9.90"}, // "Vodafone Red L"
sku1104261:{p:"prod1334441",e:"99.90"} // "Vodafone Basic 100"
},
// . . .
}
I've added comments to show the plan names.
Here we see Detail Item 2.
The SKUs listed here are plan sub-SKUs defined in window.rates
. For "Red M" we have:
window.rates = {
sku1444279:{
label:"Vodafone Red M",
propId:"prod1564453",
subsku:{
sku1444283:{ // "Samsung Galaxy SIII Blau 16 GB", etc.
monthlyChargest:"59.99",
activationCharge:"29.99",
discounts:[
"sku140988", // "Ich bin 18-25 Jahre jung" (-10)
"sku140989", // "Ich habe einen Schwerbehindertenausweis" (-10)
"sku140990" // "Ich bin Student und jünger als 30" (-10)
],
promotions:["27"], // "24 x 5 Euro Smartphone-Rabatt" (-5)
Goodies:[
"prod1674486" // "24 x 10 % Rabatt" (-6)
]
},
// more subskus here . . .
}
},
// . . .
}
Again I've added comments for the linked data. Note, many devices can link to the same subsku.
We see Detail Items 1 & 3 and links to Items 4, 5, and 6.
Goodies
links to windows.goodies
via prod
number:
window.goodies = {
prod1674486:{
SkuId:"prod1674486",
Name:"24 x 10 % Rabatt",
Value:"-6",
Type:"absolute",
DurationInMonth:"24"
},
// . . .
}
Which gives us Detail Item 4.
window.rates
also links to windows.promotions
via the subsku
's promotions
list:
window.promotions = {
27:{
promotionId:"27",
promotionName:"24 x 5 Euro Smartphone-Rabatt",
promotionValue:"-5",
Type:"absolute",
duration_in_months:"24",
deeplinkParameter:""
},
// . . .
}
Which gives us Detail Item 5.
windows.discounts
contains the special discounts for Detail Item 6:
window.discounts = {
sku140988:{
SkuId:"sku140988",
Name:"Ich bin 18-25 Jahre jung",
Type:"absolute",
DurationInMonth:"24",
Value:{
sku1444295:"-10", // "Vodafone Red Premium"
sku1444279:"-10", // "Vodafone Red M"
sku1444290:"-20"} // "Vodafone Red L"
},
sku140989:{
SkuId:"sku140989",
Name:"Ich habe einen Schwerbehindertenausweis",
Type:"absolute",
DurationInMonth:"24",
Value:{
sku1444295:"-10", // "Vodafone Red Premium"
sku1444279:"-10", // "Vodafone Red M"
sku1444290:"-20"} // "Vodafone Red L"
},
sku140990:{
SkuId:"sku140990",
Name:"Ich bin Student und jünger als 30",
Type:"absolute",
DurationInMonth:"24",
Value:{
sku1444295:"-10", // "Vodafone Red Premium"
sku1444279:"-10", // "Vodafone Red M"
sku1444290:"-20"} // "Vodafone Red L"
}
};
The proper discount amount is selected by plan major SKU (via the SKUs listed under value
).
And that's it. Just parse these 5 objects into Python objects and you'll have all the data you need.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.