简体   繁体   中英

How do I correctly remove all text from column in Pandas?

I have a dataframe as:

df:

+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|     | country                 | league                       | home_odds   |   draw_odds |   away_odds |   home_score | away_score   | home_team                | away_team                    | datetime            |
+=====+=========================+==============================+=============+=============+=============+==============+==============+==========================+==============================+=====================+
|  63 | Chile                   | Primera Division             | 2.80        |        3.05 |        2.63 |            3 | 1            | Melipilla                | O'Higgins                    | 2021-06-07 00:30:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  64 | North & Central America | CONCACAF Nations League      | 2.95        |        3.07 |        2.49 |            3 | 2 ET         | USA                      | Mexico                       | 2021-06-07 01:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  66 | World                   | World Cup 2022               | 1.04        |       13.43 |       28.04 |            0 | 1            | Kyrgyzstan               | Mongolia                     | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  65 | World                   | Friendly International       | 1.52        |        3.91 |        7.01 |            1 | 1            | Serbia                   | Jamaica                      | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+

I want the columns home_score and away_score to be just integers and I am trying regex as:

df[['home_score', 'away_score']] = re.sub('\D', '', '.*')

however all the columns are coming in blank.

How do I correctly do it?

You can try via extract() and astype() method:

df['away_score']=df['away_score'].str.extract('^(\d+)').astype(int)
df['home_score']=df['home_score'].str.extract('^(\d+)').astype(int)

OR

df['away_score']=df['away_score'].str.extract('([0-9]+)').astype(int)
df['home_score']=df['home_score'].str.extract('([0-9]+)').astype(int)

output:

+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|     | country                 | league                       | home_odds   |   draw_odds |   away_odds |   home_score | away_score   | home_team                | away_team                    | datetime            |
+=====+=========================+==============================+=============+=============+=============+==============+==============+==========================+==============================+=====================+
|  63 | Chile                   | Primera Division             | 2.80        |        3.05 |        2.63 |            3 | 1            | Melipilla                | O'Higgins                    | 2021-06-07 00:30:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  64 | North & Central America | CONCACAF Nations League      | 2.95        |        3.07 |        2.49 |            3 | 2            | USA                      | Mexico                       | 2021-06-07 01:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  66 | World                   | World Cup 2022               | 1.04        |       13.43 |       28.04 |            0 | 1            | Kyrgyzstan               | Mongolia                     | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  65 | World                   | Friendly International       | 1.52        |        3.91 |        7.01 |            1 | 1            | Serbia                   | Jamaica                      | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+

You can do df[['home_score', 'away_score']] = df[['home_score', 'away_score']].applymap(lambda x: int(float(x)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM