[英]Appending list to pandas column
I need to append the list idlist
to the column in my table called EventID
. 我需要将列表
idlist
附加到表中名为EventID
的列中。 The list needs to be appended in order, since I grabbed the ID's in order from the original HTML file. 由于我是从原始HTML文件中按顺序获取ID的,因此该列表需要按顺序附加。
Right now my output looks like this: 现在,我的输出如下所示:
EventID EventDate EventName AmntTickets PriceRange
0 103577924 Thu, 10/11/2018 8:20 p.m. Philadelphia Eagles at New York Giants MetLif... 6655 $134.50 to $2,222.50
1 103577924 Thu, 10/11/2018 8:21 p.m. PARKING PASSES ONLY Philadelphia Eagles at New... 929 $20.39 to $3,602.50
EventID EventDate EventName AmntTickets PriceRange
0 103577925 Thu, 10/11/2018 8:20 p.m. Philadelphia Eagles at New York Giants MetLif... 6655 $134.50 to $2,222.50
1 103577925 Thu, 10/11/2018 8:21 p.m. PARKING PASSES ONLY Philadelphia Eagles at New... 929 $20.39 to $3,602.50
I need it to look like this: 我需要它看起来像这样:
EventID EventDate EventName AmntTickets PriceRange
0 103577924 Thu, 10/11/2018 8:20 p.m. Philadelphia Eagles at New York Giants MetLif... 6655 $134.50 to $2,222.50
1 103577925 Thu, 10/11/2018 8:21 p.m. PARKING PASSES ONLY Philadelphia Eagles at New... 929 $20.39 to $3,602.50
My code: 我的代码:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pprint
import re
with open("htmltabletest.html", encoding="utf-8") as f:
data = f.read()
soup = BeautifulSoup(data, 'lxml')
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
dfs = pd.read_html(soup.prettify())
df = dfs[0]
dfz=df.rename(columns = {'Event date Time (local)':'EventDate'}).rename(columns = {'Event name Venue':'EventName'}).rename(columns = {'Tickets listed':'AmntTickets'}).rename(columns = {'Price range':'PriceRange'}).rename(columns = {'Unnamed: 0':'EventID'})
idlist = []
for se in soup.find_all('span', id=re.compile(r'min')):
se = (str(se))
seeme1 = se.replace('<span id="se-','')
seeme, sep, tail = seeme1.partition('-')
idlist.append(seeme)
for p in idlist:
dfz = dfz.assign(EventID=p)
print(dfz)
my html file (htmltabletest.html): 我的html文件(htmltabletest.html):
<table class="dataTable st-alternateRows" id="eventSearchTable">
<thead>
<tr>
<th id="th-es-rb"><div class="dt-th"> </div></th>
<th id="th-es-ed"><div class="dt-th"><span class="th-divider"> </span>Event date<br/>Time (local)</div></th>
<th id="th-es-en"><div class="dt-th"><span class="th-divider"> </span>Event name<br/>Venue</div></th>
<th id="th-es-ti"><div class="dt-th"><span class="th-divider"> </span>Tickets<br/>listed</div></th>
<th id="th-es-pr"><div class="dt-th es-lastCell"><span class="th-divider"> </span>Price<br/>range</div></th>
</tr>
</thead>
<tbody class="" id="eventSearchTbody"><tr class="even" id="r-se-103577924">
<td class="nowrap"><input class="es-selectedEvent" id="se-103577924-check" name="selectEvent" type="radio"/></td>
<td class="nowrap" id="se-103577924-eventDateTime">Thu, 10/11/2018<br/>8:20 p.m.</td>
<td><div><a class="ellip" href="services/priceanalysis?eventId=103577924&sectionId=0" id="se-103577924-eventName" target="_blank">Philadelphia Eagles at New York Giants</a></div><div id="se-103577924-venue">MetLife Stadium, East Rutherford, NJ</div></td>
<td id="se-103577924-nrTickets">6655</td>
<td class="es-lastCell nowrap" id="se-103577924-priceRange"><span id="se-103577924-minPrice">$134.50</span> to<br/><span id="se-103577924-maxPrice">$2,222.50</span></td>
</tr><tr class="odd" id="r-se-103577925">
<td class="nowrap"><input class="es-selectedEvent" id="se-103577925-check" name="selectEvent" type="radio"/></td>
<td class="nowrap" id="se-103577925-eventDateTime">Thu, 10/11/2018<br/>8:21 p.m.</td>
<td><div><a class="ellip" href="services/priceanalysis?eventId=103577925&sectionId=0" id="se-103577925-eventName" target="_blank">PARKING PASSES ONLY Philadelphia Eagles at New York Giants</a></div><div id="se-103577925-venue">MetLife Stadium Parking Lots, East Rutherford, NJ</div></td>
<td id="se-103577925-nrTickets">929</td>
<td class="es-lastCell nowrap" id="se-103577925-priceRange"><span id="se-103577925-minPrice">$20.39</span> to<br/><span id="se-103577925-maxPrice">$3,602.50</span></td>
</tr></tbody>
</table>
If length of the dfz dataframe is equal to length of the list, idlist . 如果DFZ据帧的长度等于列表的长度,IDLIST。
You can remove the last for loop completely. 您可以完全删除最后一个for循环。 Instead you can use
相反,您可以使用
dfz["EventID"] = idlist
dfz [“ EventID”] = idlist
import pandas as pd
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pprint
import re
with open("testfile.html") as f:
data = f.read()
soup = BeautifulSoup(data, 'lxml')
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
dfs = pd.read_html(soup.prettify())
df = dfs[0]
dfz=df.rename(columns = {'Event date Time (local)':'EventDate'}).rename(columns = {'Event name Venue':'EventName'}).rename(columns = {'Tickets listed':'AmntTickets'}).rename(columns = {'Price range':'PriceRange'}).rename(columns = {'Unnamed: 0':'EventID'})
idlist = []
for se in soup.find_all('span', id=re.compile(r'min')):
se = (str(se))
seeme1 = se.replace('<span id="se-','')
seeme, sep, tail = seeme1.partition('-')
idlist.append(seeme)
dfz["EventID"] = idlist
print(dfz)
Then you will get your dataframe you have requested. 然后,您将获得所需的数据框。
EventID EventDate EventName AmntTickets PriceRange
0 103577924 Thu, 10/11/2018 8:20 p.m. Philadelphia Eagles at New York Giants MetLif... 6655 $134.50 to $2,222.50
1 103577925 Thu, 10/11/2018 8:21 p.m. PARKING PASSES ONLY Philadelphia Eagles at New... 929 $20.39 to $3,602.50
If the dataframe dfz and the list idlist are of unequal length. 如果数据帧dfz和列表idlist的长度不相等。 And you can use the below code to append data for unequal length of lists.
而且,您可以使用下面的代码来追加不等长列表的数据。
import pandas as pd
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pprint
import re
with open("testfile.html") as f:
data = f.read()
soup = BeautifulSoup(data, 'lxml')
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
dfs = pd.read_html(soup.prettify())
df = dfs[0]
dfz=df.rename(columns = {'Event date Time (local)':'EventDate'}).rename(columns = {'Event name Venue':'EventName'}).rename(columns = {'Tickets listed':'AmntTickets'}).rename(columns = {'Price range':'PriceRange'}).rename(columns = {'Unnamed: 0':'EventID'})
idlist = []
for se in soup.find_all('span', id=re.compile(r'min')):
se = (str(se))
seeme1 = se.replace('<span id="se-','')
seeme, sep, tail = seeme1.partition('-')
idlist.append(seeme)
for ind, row in dfz.iterrows():
try:
dfz.EventID.iloc[ind] = idlist[ind]
except Exception as e:
pass
print(dfz)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.