[英]Pandas DataFrame skip rows
I am working on a weather webscraping project, and have scraped a site using selenium and exported it to excel using pandas.我正在做一个天气网页抓取项目,并使用 selenium 抓取了一个网站,并使用 Pandas 将其导出到 excel。 However, i can't find out how to make dates only appear in every fourth row, so that dates would fit in with the time.
但是,我不知道如何使日期只出现在每四行中,以便日期与时间相符。 Excel Sheet: https://i.stack.imgur.com/27w0f.jpg .
Excel 表格: https : //i.stack.imgur.com/27w0f.jpg 。 Full code:
完整代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
from datetime import datetime
url="https://pent.no/60.19401,11.09936"
dates = "forecast-day-view-date-bar__date"
times = "forecast-hour-view-hour-label"
temps = "forecast-hour-view-weather-widget__temperature"
winder = "forecast-hour-view-weather-widget__wind-speed"
rainfalls = "forecast-hour-view-weather-widget__precipitation"
driver = webdriver.Chrome()
driver.get(url)
date = driver.find_elements_by_class_name(dates)
time = driver.find_elements_by_class_name(times)
temp = driver.find_elements_by_class_name(temps)
temp2 = temp[::2]
temp3 = temp[1::2]
wind = driver.find_elements_by_class_name(winder)
wind2 = wind[::2]
wind3 = wind[1::2]
rainfall = driver.find_elements_by_class_name(rainfalls)
rainfall2 = rainfall[::2]
rainfall3 = rainfall[1::2]
a = []
b = []
c = []
d = []
e = []
f = []
g = []
h = []
for dates in date:
print(dates.text)
a.append(dates.text)
df1 = pd.DataFrame(a, columns= ["Date"])
#
for times in time:
print(times.text)
b.append(times.text)
df2 = pd.DataFrame(b, columns= ["Time"])
#
for tempyr in temp2:
print(tempyr.text)
c.append(tempyr.text)
df3 = pd.DataFrame(c, columns= ["Temp Yr"])
for tempstorm in temp3:
print(tempstorm.text)
d.append(tempstorm.text)
df4 = pd.DataFrame(d, columns= ["Temp Storm"])
#
for windyr in wind2:
print(windyr.text)
e.append(windyr.text)
df5 = pd.DataFrame(e, columns= ["Wind Yr"])
for windstorm in wind3:
print(windstorm.text)
f.append(windstorm.text)
df6 = pd.DataFrame(f, columns= ["Wind Storm"])
#
for rainfallyr in rainfall2:
print(rainfallyr.text)
g.append(rainfallyr.text)
df7 = pd.DataFrame(g, columns= ["Rainfall Yr"])
for rainfallstorm in rainfall3:
print(rainfallstorm.text)
h.append(rainfallstorm.text)
df8 = pd.DataFrame(h, columns= ["Rainfall Storm"])
#
tabell = [df1, df2, df3, df4, df5, df6, df7, df8]
result = pd.concat(tabell, axis=1)
result.to_excel("weather" + str(int(datetime.now().day)) + ".xlsx")
driver.quit()
Try to create a variable which ensures its the 4th multiple iteration with in loop.尝试创建一个变量,以确保其在循环中进行第 4 次多次迭代。
Check the below snippet.检查以下代码段。
row = 0
for dates in date:
print(dates.text)
a.append(dates.text) if row % 4 == 0 else a.append("")
row = row + 1
df1 = pd.DataFrame(a, columns= ["Date"])
** Quick Check :** ** 快速检查 :**
a = []
row = 0
for i in range(10):
a.append(i) if row % 4 == 0 else a.append("")
row += 1
df = pd.DataFrame(a, columns= ["Date"])
print(df)
Date
0 0
1
2
3
4 4
5
6
7
8 8
9
One more suggestion is to have a dataframe creation out of the loop not with in the loop.另一个建议是在循环之外创建数据帧,而不是在循环中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.