[英]Python : Pandas - ONLY remove NaN rows and move up data, do not move up data in rows with partial NaNs
Alright, so here is my code that I'm currently drafting to pull all national league players fielding stats.好吧,这是我目前正在起草的代码,用于提取所有国家联盟球员的上场数据。 It works fine, however, I am interested in knowing how to drop ONLY lines of NaNs in dataframes without disturbing any of the data:
它工作正常,但是,我有兴趣知道如何在不干扰任何数据的情况下仅删除数据帧中的 NaN 行:
# import libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
# create a url object
url = r'https://www.baseball-reference.com/leagues/NL/2022-standard-fielding.shtml'
# create list of the stats that we care about
standardFieldingStats = [
'player',
'team_ID',
'G',
'GS',
'CG',
'Inn_def',
'chances',
'PO',
'A',
'E_def',
'DP_def',
'fielding_perc',
'tz_runs_total',
'tz_runs_total_per_season',
'bis_runs_total',
'bis_runs_total_per_season',
'bis_runs_good_plays',
'range_factor_per_nine',
'range_factor_per_game',
'pos_summary'
]
# Create object page
page = requests.get(url)
# parser-lxml = Change html to Python friendly format
# Obtain page's information
soup = BeautifulSoup(page.text, 'lxml')
# grab each teams current year batting stats and turn it into a dataframe
tableNLFielding = soup.find('table', id='players_players_standard_fielding_fielding')
# grab player UID
puidList = []
rows = tableNLFielding.select('tr')
for row in rows:
playerUID = row.select_one('td[data-append-csv]')
playerUID = playerUID.get('data-append-csv')if playerUID else None
if playerUID == None:
continue
else:
puidList.append(playerUID)
# grab players position
compList = []
for row in rows:
thingList = []
for stat in range(len(standardFieldingStats)):
thing = row.find("td", attrs={"data-stat" : standardFieldingStats[stat]})
if thing == None:
continue
elif row.find("td", attrs={"data-stat" : 'player'}).text == 'Team Totals':
continue
elif row.find("td", attrs={"data-stat" : 'player'}).text == 'Rank in 15 NL teams':
continue
elif row.find("td", attrs={"data-stat" : 'player'}).text == 'Rank in 15 AL teams':
continue
elif thing.text == '':
continue
elif thing.text == 'NaN':
continue
else:
thingList.append(thing.text)
compList.append(thingList)
# insert the batting headers to a dataframe
NLFieldingDf = pd.DataFrame(data=compList, columns=standardFieldingStats)
#NLFieldingDf = NLFieldingDf.apply(lambda x: pd.Series(x.dropna().values))
#NLFieldingDf = NLFieldingDf.apply(lambda x: pd.Series(x.fillna('').values))
# make all NaNs blanks for aesthic reasons
#NLFieldingDf = NLFieldingDf.fillna('')
#NLFieldingDf.insert(loc=0, column='pUID', value=puidList)
An example is: Dataframe I want to remove NaNs from:一个示例是:Dataframe 我想从以下位置删除 NaN:
player team pos_summary
NaN NaN NaN
Brandon Woodruff NaN P
William Woods ATL NaN
Kyle Wright ATL P
My dataframe when I try looks like this, moving the data out of place:当我尝试时,我的 dataframe 看起来像这样,将数据移到了别处:
player team pos_summary
Brandon Woodruff ATL P
William Woods ATL P
Kyle Wright
Ideally, I want this, but no NaN rows and maintaining rows with partial NaNs:理想情况下,我想要这个,但没有 NaN 行并维护具有部分 NaN 的行:
player team pos_summary
Brandon Woodruff P
William Woods ATL
Kyle Wright ATL P
Refer to the end of the complete code to see my attempts.完整代码参考末尾看我的尝试。
try this to remove all NaN rows试试这个删除所有 NaN 行
df.dropna(how="all")
df.dropna(如何=“全部”)
Further, if you need to replace the NaN values with '', then use此外,如果您需要用 '' 替换 NaN 值,则使用
df.fillna('', inplace=True)
df.fillna('', inplace=True)
You could do it that way, however, your data isn't accurate.您可以那样做,但是,您的数据不准确。 You shouldn't be getting nulls in player position or team.
你不应该在玩家 position 或团队中得到空值。
Secondly, if you need to parse <table>
tags (and you don't need to pull out any attributes like a href) let pandas
parse that table for you.其次,如果您需要解析
<table>
标签(并且您不需要提取任何属性,如 href),让pandas
为您解析该表。 It uses beautifulsoup under the hood.它在引擎盖下使用 beautifulsoup。
import pandas as pd
url = r'https://www.baseball-reference.com/leagues/NL/2022-standard-fielding.shtml'
df = pd.read_html(url)[-1]
df = df[df['Rk'].ne('Rk')]
Output: Output:
print(df[['Name', 'Tm', 'Pos Summary']])
Name Tm Pos Summary
0 C.J. Abrams SDP SS-2B-OF
1 Ronald Acuna Jr. ATL OF
2 Willy Adames MIL SS
3 Austin Adams SDP P
4 Riley Adams WSN C-1B
.. ... ... ...
509 Miguel Yajure PIT P
510 Mike Yastrzemski SFG OF
511 Christian Yelich MIL OF
512 Juan Yepez STL OF
513 Huascar Ynoa ATL P
[495 rows x 3 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.