简体   繁体   中英

Pandas: How to take values in a DataFrame column and put them all in the same row

I have the below DataFrame that I saved to excel using the pandas library:

Report No.   Score      Specifications
26-013RN42  >=1000      WaterSense certified
26-013RN42  >=1000      Single-Flush HET
26-013RN42  >=1000      Floor Mounted
26-013RN42  >=1000      2 Piece Unit
26-013RN42  >=1000      Round
26-013RN42  >=1000      Standard
26-013RN42  >=1000      Gravity
26-013RN42  >=1000      Floor Outlet
26-013RN42  >=1000      Flapper size 3in
26-013RN42  >=1000      Rough-in: 10"
26-013RN42  >=1000      Insulated: No

As you can see the "Report No." column and the "Score" column are all the same value but the "Specifications" columns are all different.

What I was hoping to do was combine all of the values under the "Specifications" column into one row as seen below:

Report No.   Score      Specifications
26-013RN42    >=1000     WaterSense certified, Single-Flush HET, Floor Mounted, 2 Piece Unit, Round, Standard, Gravity, Floor Outlet, Flapper size 3in, Rough-in: 10", Insulated: No

EDIT:

Here is my input code. The purpose of this code is to go to a website, scrape data and organize it into a table. Didn't post it before as it is a tad messy and I know there are ways for it to be more efficient. Please let me know if you have any suggestions on how to improve the code!

python:

url2 = 'https://www.map-testing.com/map-search/?start=3&searchOptions=AllResults'
urlh2 = requests.get(url2)
info2 = urlh2.text

soup = BeautifulSoup(info2, 'html.parser')
toilets = soup.find_all('div', attrs= {'class' : 'search-result'})
testlist = []
datalist = []

for s in toilets[0].stripped_strings:
    datalist.append(s)
dict = {}
count = 0
for info in datalist[:9]:
    if count == 0:
        dict[info] = datalist[count + 1]
        count += 1
    elif (count % 2) == 1:
        count += 1
        continue
    elif (count % 2) == 0:
        dict[info] = datalist[count + 1]
        count += 1
specs = datalist[11:22]
dict['Specifications'] = specs
df = pd.DataFrame(dict)

Using BeautifulSoup to scrape html web page data. and using pandas library to convert json data into DataFrame.

from bs4 import BeautifulSoup
import requests
import pandas as pd

url2 = 'https://www.map-testing.com/map-search/?start=3&searchOptions=AllResults'
urlh2 = requests.get(url2)

soup = BeautifulSoup(urlh2.text, 'html.parser')
results = soup.find_all('div', attrs= {'class' : 'search-result'})

jsonData = []

for row_obj in results:
    data = {}
    row = row_obj.find("div")

    #scrape Manufacturer
    manufacturer = row.find("div", string="Manufacturer")
    data['Manufacturer']  = manufacturer.find_next('div').text.strip()

    # scrape Model Name
    modelName = row.find("div", string="Model Name")
    data['Model Name'] = modelName.find_next('div').text.strip()

    # scrape Model Number
    modelNumber = row.find("div", string="Model Number")
    data['Model Number'] = modelNumber.find_next('div').text.strip()

    # scrape MaP Report No.
    maPReportNo = row.find("div", string="MaP Report No.")
    data['MaP Report No.'] = maPReportNo.find_next('div').text.strip()

    # scrape MaP Flush Score
    maPFlushScore = row.find("div", string="MaP Flush Score")
    data['MaP Flush Score'] = maPFlushScore.find_next('div').text.strip()

    # scrape Specifications
    specifications = row.find_all("li")
    data['Specifications'] = ",".join(i.text.strip() for i in specifications)

    jsonData.append(data)

df = pd.DataFrame(jsonData)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM