简体   繁体   English

Pandas DataFrame 跳过行

[英]Pandas DataFrame skip rows

I am working on a weather webscraping project, and have scraped a site using selenium and exported it to excel using pandas.我正在做一个天气网页抓取项目,并使用 selenium 抓取了一个网站,并使用 Pandas 将其导出到 excel。 However, i can't find out how to make dates only appear in every fourth row, so that dates would fit in with the time.但是,我不知道如何使日期只出现在每四行中,以便日期与时间相符。 Excel Sheet: https://i.stack.imgur.com/27w0f.jpg . Excel 表格: https : //i.stack.imgur.com/27w0f.jpg Full code:完整代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
from datetime import datetime

url="https://pent.no/60.19401,11.09936"

dates = "forecast-day-view-date-bar__date"
times = "forecast-hour-view-hour-label"
temps = "forecast-hour-view-weather-widget__temperature"
winder = "forecast-hour-view-weather-widget__wind-speed"
rainfalls = "forecast-hour-view-weather-widget__precipitation"

driver = webdriver.Chrome()
driver.get(url)

date = driver.find_elements_by_class_name(dates)

time = driver.find_elements_by_class_name(times)

temp = driver.find_elements_by_class_name(temps)
temp2 = temp[::2]
temp3 = temp[1::2]

wind = driver.find_elements_by_class_name(winder)
wind2 = wind[::2]
wind3 = wind[1::2]

rainfall = driver.find_elements_by_class_name(rainfalls)
rainfall2 = rainfall[::2]
rainfall3 = rainfall[1::2]

a = []
b = []
c = []
d = []
e = []
f = []
g = []
h = []

for dates in date:
    print(dates.text)
    a.append(dates.text)
    df1 = pd.DataFrame(a, columns= ["Date"])
    
#
for times in time:
    print(times.text)
    b.append(times.text)
    df2 = pd.DataFrame(b, columns= ["Time"])
#  
for tempyr in temp2:
    print(tempyr.text)
    c.append(tempyr.text)
    df3 = pd.DataFrame(c, columns= ["Temp Yr"])

for tempstorm in temp3:
    print(tempstorm.text)
    d.append(tempstorm.text)
    df4 = pd.DataFrame(d, columns= ["Temp Storm"])
#   
for windyr in wind2:
    print(windyr.text)
    e.append(windyr.text)
    df5 = pd.DataFrame(e, columns= ["Wind Yr"])

for windstorm in wind3:
    print(windstorm.text)
    f.append(windstorm.text)
    df6 = pd.DataFrame(f, columns= ["Wind Storm"])
#   
for rainfallyr in rainfall2:
    print(rainfallyr.text)
    g.append(rainfallyr.text)
    df7 = pd.DataFrame(g, columns= ["Rainfall Yr"])
  
for rainfallstorm in rainfall3:
    print(rainfallstorm.text)
    h.append(rainfallstorm.text)
    df8 = pd.DataFrame(h, columns= ["Rainfall Storm"])
#
tabell = [df1, df2, df3, df4, df5, df6, df7, df8]
result = pd.concat(tabell, axis=1)

result.to_excel("weather" + str(int(datetime.now().day)) + ".xlsx")

        
driver.quit()

Try to create a variable which ensures its the 4th multiple iteration with in loop.尝试创建一个变量,以确保其在循环中进行第 4 次多次迭代。

Check the below snippet.检查以下代码段。

row = 0
for dates in date:
    print(dates.text)
    a.append(dates.text) if row % 4 == 0 else a.append("")
    row = row + 1
df1 = pd.DataFrame(a, columns= ["Date"])

** Quick Check :** ** 快速检查 :**

a = []
row = 0

for i in range(10):
    a.append(i) if row % 4 == 0 else a.append("")
    row += 1

df = pd.DataFrame(a, columns= ["Date"])

print(df)


    Date
0   0
1   
2   
3   
4   4
5   
6   
7   
8   8
9   

One more suggestion is to have a dataframe creation out of the loop not with in the loop.另一个建议是在循环之外创建数据帧,而不是在循环中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM