简体   繁体   中英

Python: Slice a column string in Pandas Dataframe

I am trying to modify a column string. I want to remove unwanted numbers in the column and save the modified value as a new column.

Here's an example in SQL:

Using cast(substring(EMP_NM,0,CHARINDEX(' ',EMP_NM))as int) I get the below result.

在此处输入图像描述

How can I do this in python where I only get the "4253332" part of the numbers in a new column?

df.['EMP_NM'] = df['EMP_NM'].str.slice(0, 9) -- This does not give the result I want as some values in the column can be defined as below:

009201135 0000000000 0000000000 0000000000 0000000000

0006892203 0000000000 0000000000 0000000000 0000000000

Any help would be appreciated.

Try this:


df['EMP_NM'] = df['EMP_NM'].astype(str).str[0:7]

If this data field is read into python as an integer the leading '0's will be removed automatically. Therefore you can just index to keep only the first 7 characters.

  • .astype(str) is used to consider the field as a string for indexing. You can only index strings and not integers or floats.
  • .str[0:7] is used to index the string keeping 7 values.

How about this:

df['EMP_NM'] = df['EMP_NM'].str.replace('0','')

Or in case you have also something like this 009201135 0000000000 32331 0000000000 0000000000 as possible number. This means, something different from zero also in the other part of the code, this should works better:

df['EMP_fNM'] = df['EMP_NM'].str.split()[0][0].strip('0')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM