I have the below DataFrame that I saved to excel using the pandas library:
Report No. Score Specifications
26-013RN42 >=1000 WaterSense certified
26-013RN42 >=1000 Single-Flush HET
26-013RN42 >=1000 Floor Mounted
26-013RN42 >=1000 2 Piece Unit
26-013RN42 >=1000 Round
26-013RN42 >=1000 Standard
26-013RN42 >=1000 Gravity
26-013RN42 >=1000 Floor Outlet
26-013RN42 >=1000 Flapper size 3in
26-013RN42 >=1000 Rough-in: 10"
26-013RN42 >=1000 Insulated: No
As you can see the "Report No." column and the "Score" column are all the same value but the "Specifications" columns are all different.
What I was hoping to do was combine all of the values under the "Specifications" column into one row as seen below:
Report No. Score Specifications
26-013RN42 >=1000 WaterSense certified, Single-Flush HET, Floor Mounted, 2 Piece Unit, Round, Standard, Gravity, Floor Outlet, Flapper size 3in, Rough-in: 10", Insulated: No
EDIT:
Here is my input code. The purpose of this code is to go to a website, scrape data and organize it into a table. Didn't post it before as it is a tad messy and I know there are ways for it to be more efficient. Please let me know if you have any suggestions on how to improve the code!
python:
url2 = 'https://www.map-testing.com/map-search/?start=3&searchOptions=AllResults'
urlh2 = requests.get(url2)
info2 = urlh2.text
soup = BeautifulSoup(info2, 'html.parser')
toilets = soup.find_all('div', attrs= {'class' : 'search-result'})
testlist = []
datalist = []
for s in toilets[0].stripped_strings:
datalist.append(s)
dict = {}
count = 0
for info in datalist[:9]:
if count == 0:
dict[info] = datalist[count + 1]
count += 1
elif (count % 2) == 1:
count += 1
continue
elif (count % 2) == 0:
dict[info] = datalist[count + 1]
count += 1
specs = datalist[11:22]
dict['Specifications'] = specs
df = pd.DataFrame(dict)
Using BeautifulSoup
to scrape html web page data. and using pandas
library to convert json data into DataFrame.
from bs4 import BeautifulSoup
import requests
import pandas as pd
url2 = 'https://www.map-testing.com/map-search/?start=3&searchOptions=AllResults'
urlh2 = requests.get(url2)
soup = BeautifulSoup(urlh2.text, 'html.parser')
results = soup.find_all('div', attrs= {'class' : 'search-result'})
jsonData = []
for row_obj in results:
data = {}
row = row_obj.find("div")
#scrape Manufacturer
manufacturer = row.find("div", string="Manufacturer")
data['Manufacturer'] = manufacturer.find_next('div').text.strip()
# scrape Model Name
modelName = row.find("div", string="Model Name")
data['Model Name'] = modelName.find_next('div').text.strip()
# scrape Model Number
modelNumber = row.find("div", string="Model Number")
data['Model Number'] = modelNumber.find_next('div').text.strip()
# scrape MaP Report No.
maPReportNo = row.find("div", string="MaP Report No.")
data['MaP Report No.'] = maPReportNo.find_next('div').text.strip()
# scrape MaP Flush Score
maPFlushScore = row.find("div", string="MaP Flush Score")
data['MaP Flush Score'] = maPFlushScore.find_next('div').text.strip()
# scrape Specifications
specifications = row.find_all("li")
data['Specifications'] = ",".join(i.text.strip() for i in specifications)
jsonData.append(data)
df = pd.DataFrame(jsonData)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.