简体   繁体   中英

Drop Rows with Non-Numeric Entries in a Column (Python)

I am trying to download data from a website. When I do this, there are some rows that are not part of the data included, which is obvious because their first column is not a number.

So I'm getting something like

GM_Num     Date               Tm
1          Monday, Apr 3      LAA
2          Tuesday, Apr 4     LAA
...        ...                ...
Gm#        May                Tm

where the last row is one that I want to drop. In the actual table, there are multiple rows like this randomly throughout the table.

Here is the code that I have tried so far to drop those rows:

import requests
import pandas as pd

url = 'https://www.baseball-reference.com/teams/LAA/2017-schedule-scores.shtml'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
df.rename(columns={"Gm#": "GM_Num"}, inplace = True)

#Attempts that didn't work:
df[df['GM_Num'].str.isdigit().isnull()]
#df[df.GM_Num.apply(lambda x: x.isnumeric())].set_index('GM_Num', inplace = True)

#df.set_index('GM_Num', inplace = True)
df

Thank you in advance for any help!

Let's cast your 'Gm#' column and drop records in a couple of steps:

df['Gm#'] = pd.to_numeric(df['Gm#'], errors='coerce')
df = df.dropna(subset=['Gm#'])

df

Output:

       Gm#               Date Unnamed: 2   Tm Unnamed: 4  Opp   W/L  R RA  \
0      1.0      Monday, Apr 3   boxscore  LAA          @  OAK     L  2  4   
1      2.0     Tuesday, Apr 4   boxscore  LAA          @  OAK     W  7  6   
2      3.0   Wednesday, Apr 5   boxscore  LAA          @  OAK     W  5  0   
3      4.0    Thursday, Apr 6   boxscore  LAA          @  OAK     L  1  5   
4      5.0      Friday, Apr 7   boxscore  LAA        NaN  SEA     W  5  1   
..     ...                ...        ...  ...        ...  ...   ... .. ..   
162  158.0  Wednesday, Sep 27   boxscore  LAA          @  CHW  L-wo  4  6   
163  159.0   Thursday, Sep 28   boxscore  LAA          @  CHW     L  4  5   
164  160.0     Friday, Sep 29   boxscore  LAA        NaN  SEA     W  6  5   
165  161.0   Saturday, Sep 30   boxscore  LAA        NaN  SEA     L  4  6   
167  162.0      Sunday, Oct 1   boxscore  LAA        NaN  SEA     W  6  2   

     Inn  ... Rank    GB       Win         Loss       Save  Time D/N  \
0    NaN  ...    3   1.0  Graveman      Nolasco    Casilla  2:56   N   
1    NaN  ...    2   1.0    Bailey         Dull  Bedrosian  3:17   N   
2    NaN  ...    2   1.0   Ramirez       Cotton        NaN  3:15   N   
3    NaN  ...    2   1.0    Triggs       Skaggs        NaN  2:44   D   
4    NaN  ...    1  Tied    Chavez     Gallardo        NaN  2:56   N   
..   ...  ...  ...   ...       ...          ...        ...   ...  ..   
162   10  ...    2  20.0  Farquhar       Parker        NaN  3:58   N   
163  NaN  ...    2  21.0   Infante       Chavez     Minaya  3:04   N   
164  NaN  ...    2  21.0      Wood  Rzepczynski     Parker  3:01   N   
165  NaN  ...    2  21.0  Lawrence    Bedrosian       Diaz  3:32   N   
167  NaN  ...    2  21.0  Bridwell      Simmons        NaN  2:38   D   

    Attendance Streak Orig. Scheduled  
0        36067      -             NaN  
1        11225      +             NaN  
2        13405     ++             NaN  
3        13292      -             NaN  
4        43911      +             NaN  
..         ...    ...             ...  
162      17012      -             NaN  
163      19596     --             NaN  
164      35106      +             NaN  
165      38075      -             NaN  
167      34940      +             NaN  

[162 rows x 21 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM