简体   繁体   English

将列表追加到熊猫列

[英]Appending list to pandas column

I need to append the list idlist to the column in my table called EventID . 我需要将列表idlist附加到表中名为EventID的列中。 The list needs to be appended in order, since I grabbed the ID's in order from the original HTML file. 由于我是从原始HTML文件中按顺序获取ID的,因此该列表需要按顺序附加。

Right now my output looks like this: 现在,我的输出如下所示:

     EventID                   EventDate                                          EventName  AmntTickets              PriceRange
0  103577924  Thu, 10/11/2018  8:20 p.m.  Philadelphia Eagles at New York Giants  MetLif...         6655  $134.50  to  $2,222.50
1  103577924  Thu, 10/11/2018  8:21 p.m.  PARKING PASSES ONLY Philadelphia Eagles at New...          929   $20.39  to  $3,602.50
     EventID                   EventDate                                          EventName  AmntTickets              PriceRange
0  103577925  Thu, 10/11/2018  8:20 p.m.  Philadelphia Eagles at New York Giants  MetLif...         6655  $134.50  to  $2,222.50
1  103577925  Thu, 10/11/2018  8:21 p.m.  PARKING PASSES ONLY Philadelphia Eagles at New...          929   $20.39  to  $3,602.50

I need it to look like this: 我需要它看起来像这样:

     EventID                   EventDate                                          EventName  AmntTickets              PriceRange
0  103577924  Thu, 10/11/2018  8:20 p.m.  Philadelphia Eagles at New York Giants  MetLif...         6655  $134.50  to  $2,222.50
1  103577925  Thu, 10/11/2018  8:21 p.m.  PARKING PASSES ONLY Philadelphia Eagles at New...          929   $20.39  to  $3,602.50

My code: 我的代码:

import pandas as pd
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pprint
import re

with open("htmltabletest.html", encoding="utf-8") as f:
    data = f.read()
    soup = BeautifulSoup(data, 'lxml')
    pd.set_option('display.max_rows', 500)
    pd.set_option('display.max_columns', 500)
    pd.set_option('display.width', 1000)
    dfs = pd.read_html(soup.prettify())
    df = dfs[0]
    dfz=df.rename(columns = {'Event date  Time (local)':'EventDate'}).rename(columns = {'Event name  Venue':'EventName'}).rename(columns = {'Tickets  listed':'AmntTickets'}).rename(columns = {'Price  range':'PriceRange'}).rename(columns = {'Unnamed: 0':'EventID'})
    idlist = []
    for se in soup.find_all('span', id=re.compile(r'min')):
        se = (str(se))
        seeme1 = se.replace('<span id="se-','')
        seeme, sep, tail = seeme1.partition('-')
        idlist.append(seeme)
    for p in idlist:
        dfz = dfz.assign(EventID=p)
        print(dfz)

my html file (htmltabletest.html): 我的html文件(htmltabletest.html):

<table class="dataTable st-alternateRows" id="eventSearchTable">
<thead>
<tr>
<th id="th-es-rb"><div class="dt-th"> </div></th>
<th id="th-es-ed"><div class="dt-th"><span class="th-divider"> </span>Event date<br/>Time (local)</div></th>
<th id="th-es-en"><div class="dt-th"><span class="th-divider"> </span>Event name<br/>Venue</div></th>
<th id="th-es-ti"><div class="dt-th"><span class="th-divider"> </span>Tickets<br/>listed</div></th>
<th id="th-es-pr"><div class="dt-th es-lastCell"><span class="th-divider"> </span>Price<br/>range</div></th>
</tr>
</thead>
<tbody class="" id="eventSearchTbody"><tr class="even" id="r-se-103577924">
<td class="nowrap"><input class="es-selectedEvent" id="se-103577924-check" name="selectEvent" type="radio"/></td>
<td class="nowrap" id="se-103577924-eventDateTime">Thu, 10/11/2018<br/>8:20 p.m.</td>
<td><div><a class="ellip" href="services/priceanalysis?eventId=103577924&amp;sectionId=0" id="se-103577924-eventName" target="_blank">Philadelphia Eagles at New York Giants</a></div><div id="se-103577924-venue">MetLife Stadium, East Rutherford, NJ</div></td>
<td id="se-103577924-nrTickets">6655</td>
<td class="es-lastCell nowrap" id="se-103577924-priceRange"><span id="se-103577924-minPrice">$134.50</span>  to<br/><span id="se-103577924-maxPrice">$2,222.50</span></td>
</tr><tr class="odd" id="r-se-103577925">
<td class="nowrap"><input class="es-selectedEvent" id="se-103577925-check" name="selectEvent" type="radio"/></td>
<td class="nowrap" id="se-103577925-eventDateTime">Thu, 10/11/2018<br/>8:21 p.m.</td>
<td><div><a class="ellip" href="services/priceanalysis?eventId=103577925&amp;sectionId=0" id="se-103577925-eventName" target="_blank">PARKING PASSES ONLY Philadelphia Eagles at New York Giants</a></div><div id="se-103577925-venue">MetLife Stadium Parking Lots, East Rutherford, NJ</div></td>
<td id="se-103577925-nrTickets">929</td>
<td class="es-lastCell nowrap" id="se-103577925-priceRange"><span id="se-103577925-minPrice">$20.39</span>  to<br/><span id="se-103577925-maxPrice">$3,602.50</span></td>
</tr></tbody>
</table>

If length of the dfz dataframe is equal to length of the list, idlist . 如果DFZ据帧的长度等于列表的长度,IDLIST。

You can remove the last for loop completely. 您可以完全删除最后一个for循环。 Instead you can use 相反,您可以使用

dfz["EventID"] = idlist dfz [“ EventID”] = idlist

import pandas as pd
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pprint
import re

with open("testfile.html") as f:
    data = f.read()
    soup = BeautifulSoup(data, 'lxml')
    pd.set_option('display.max_rows', 500)
    pd.set_option('display.max_columns', 500)
    pd.set_option('display.width', 1000)
    dfs = pd.read_html(soup.prettify())
    df = dfs[0]
    dfz=df.rename(columns = {'Event date  Time (local)':'EventDate'}).rename(columns = {'Event name  Venue':'EventName'}).rename(columns = {'Tickets  listed':'AmntTickets'}).rename(columns = {'Price  range':'PriceRange'}).rename(columns = {'Unnamed: 0':'EventID'})
    idlist = []
    for se in soup.find_all('span', id=re.compile(r'min')):
        se = (str(se))
        seeme1 = se.replace('<span id="se-','')
        seeme, sep, tail = seeme1.partition('-')
        idlist.append(seeme)
    dfz["EventID"] = idlist
    print(dfz)

Then you will get your dataframe you have requested. 然后,您将获得所需的数据框。

     EventID                   EventDate                                          EventName  AmntTickets              PriceRange
0  103577924  Thu, 10/11/2018  8:20 p.m.  Philadelphia Eagles at New York Giants  MetLif...         6655  $134.50  to  $2,222.50
1  103577925  Thu, 10/11/2018  8:21 p.m.  PARKING PASSES ONLY Philadelphia Eagles at New...          929   $20.39  to  $3,602.50

If the dataframe dfz and the list idlist are of unequal length. 如果数据帧dfz和列表idlist的长度不相等。 And you can use the below code to append data for unequal length of lists. 而且,您可以使用下面的代码来追加不等长列表的数据。

import pandas as pd
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pprint
import re

with open("testfile.html") as f:
    data = f.read()
    soup = BeautifulSoup(data, 'lxml')
    pd.set_option('display.max_rows', 500)
    pd.set_option('display.max_columns', 500)
    pd.set_option('display.width', 1000)
    dfs = pd.read_html(soup.prettify())
    df = dfs[0]
    dfz=df.rename(columns = {'Event date  Time (local)':'EventDate'}).rename(columns = {'Event name  Venue':'EventName'}).rename(columns = {'Tickets  listed':'AmntTickets'}).rename(columns = {'Price  range':'PriceRange'}).rename(columns = {'Unnamed: 0':'EventID'})
    idlist = []
    for se in soup.find_all('span', id=re.compile(r'min')):
        se = (str(se))
        seeme1 = se.replace('<span id="se-','')
        seeme, sep, tail = seeme1.partition('-')
        idlist.append(seeme)

    for ind, row in dfz.iterrows():
        try:
            dfz.EventID.iloc[ind] = idlist[ind]
        except Exception as e:
            pass
    print(dfz)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM