Python：Pandas - 仅删除 NaN 行并向上移动数据，不向上移动具有部分 NaN 的行中的数据

Question

Alright, so here is my code that I'm currently drafting to pull all national league players fielding stats.好吧，这是我目前正在起草的代码，用于提取所有国家联盟球员的上场数据。 It works fine, however, I am interested in knowing how to drop ONLY lines of NaNs in dataframes without disturbing any of the data:它工作正常，但是，我有兴趣知道如何在不干扰任何数据的情况下仅删除数据帧中的 NaN 行：

# import libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd

# create a url object
url = r'https://www.baseball-reference.com/leagues/NL/2022-standard-fielding.shtml'

# create list of the stats that we care about
standardFieldingStats = [
    'player',
    'team_ID',
    'G',
    'GS',
    'CG',
    'Inn_def',
    'chances',
    'PO',
    'A',
    'E_def',
    'DP_def',
    'fielding_perc',
    'tz_runs_total',
    'tz_runs_total_per_season',
    'bis_runs_total',
    'bis_runs_total_per_season',
    'bis_runs_good_plays',
    'range_factor_per_nine',
    'range_factor_per_game',
    'pos_summary'
]

# Create object page
page = requests.get(url)

# parser-lxml = Change html to Python friendly format
# Obtain page's information
soup = BeautifulSoup(page.text, 'lxml')

# grab each teams current year batting stats and turn it into a dataframe
tableNLFielding = soup.find('table', id='players_players_standard_fielding_fielding')

# grab player UID
puidList = []
rows = tableNLFielding.select('tr')
for row in rows:
    playerUID = row.select_one('td[data-append-csv]')
    playerUID = playerUID.get('data-append-csv')if playerUID else None
    if playerUID == None:
        continue
    else:
        puidList.append(playerUID)

# grab players position
compList = []
for row in rows:
    thingList = []
    for stat in range(len(standardFieldingStats)):
        thing = row.find("td", attrs={"data-stat" : standardFieldingStats[stat]})
        if thing == None:
            continue
        elif row.find("td", attrs={"data-stat" : 'player'}).text == 'Team Totals':
            continue
        elif row.find("td", attrs={"data-stat" : 'player'}).text == 'Rank in 15 NL teams':
            continue
        elif row.find("td", attrs={"data-stat" : 'player'}).text == 'Rank in 15 AL teams':
            continue
        elif thing.text == '':
            continue
        elif thing.text == 'NaN':
            continue
        else:
            thingList.append(thing.text)
    compList.append(thingList)

# insert the batting headers to a dataframe
NLFieldingDf = pd.DataFrame(data=compList, columns=standardFieldingStats)

#NLFieldingDf = NLFieldingDf.apply(lambda x: pd.Series(x.dropna().values))

#NLFieldingDf = NLFieldingDf.apply(lambda x: pd.Series(x.fillna('').values))

# make all NaNs blanks for aesthic reasons
#NLFieldingDf = NLFieldingDf.fillna('')

#NLFieldingDf.insert(loc=0, column='pUID', value=puidList)

An example is: Dataframe I want to remove NaNs from:一个示例是：Dataframe 我想从以下位置删除 NaN：

player             team   pos_summary
NaN                NaN    NaN
Brandon Woodruff   NaN    P   
William Woods      ATL    NaN
Kyle Wright        ATL    P

My dataframe when I try looks like this, moving the data out of place:当我尝试时，我的 dataframe 看起来像这样，将数据移到了别处：

player             team   pos_summary
Brandon Woodruff   ATL    P   
William Woods      ATL    P
Kyle Wright

Ideally, I want this, but no NaN rows and maintaining rows with partial NaNs:理想情况下，我想要这个，但没有 NaN 行并维护具有部分 NaN 的行：

player             team   pos_summary
Brandon Woodruff          P   
William Woods      ATL    
Kyle Wright        ATL    P

Refer to the end of the complete code to see my attempts.完整代码参考末尾看我的尝试。

Answer 1

try this to remove all NaN rows试试这个删除所有 NaN 行

df.dropna(how="all") df.dropna（如何=“全部”）

Further, if you need to replace the NaN values with '', then use此外，如果您需要用 '' 替换 NaN 值，则使用

df.fillna('', inplace=True) df.fillna('', inplace=True)

Answer 2

You could do it that way, however, your data isn't accurate.您可以那样做，但是，您的数据不准确。 You shouldn't be getting nulls in player position or team.你不应该在玩家 position 或团队中得到空值。

Secondly, if you need to parse <table> tags (and you don't need to pull out any attributes like a href) let pandas parse that table for you.其次，如果您需要解析<table>标签（并且您不需要提取任何属性，如 href），让pandas为您解析该表。 It uses beautifulsoup under the hood.它在引擎盖下使用 beautifulsoup。

import pandas as pd

url = r'https://www.baseball-reference.com/leagues/NL/2022-standard-fielding.shtml'
df = pd.read_html(url)[-1]
df = df[df['Rk'].ne('Rk')]

Output: Output：

print(df[['Name', 'Tm', 'Pos Summary']])
                 Name   Tm Pos Summary
0         C.J. Abrams  SDP    SS-2B-OF
1    Ronald Acuna Jr.  ATL          OF
2        Willy Adames  MIL          SS
3        Austin Adams  SDP           P
4         Riley Adams  WSN        C-1B
..                ...  ...         ...
509     Miguel Yajure  PIT           P
510  Mike Yastrzemski  SFG          OF
511  Christian Yelich  MIL          OF
512        Juan Yepez  STL          OF
513      Huascar Ynoa  ATL           P

[495 rows x 3 columns]

Python：Pandas - 仅删除 NaN 行并向上移动数据，不向上移动具有部分 NaN 的行中的数据

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-05-03 02:06:32

解决方案2
1 2022-05-05 08:23:33

Python：Pandas - 仅删除 NaN 行并向上移动数据，不向上移动具有部分 NaN 的行中的数据

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-05-03 02:06:32

解决方案2 1 2022-05-05 08:23:33

解决方案1
1 已采纳 2022-05-03 02:06:32

解决方案2
1 2022-05-05 08:23:33