简体   繁体   中英

Pandas.series str extract does not get string of one digit

I need to extract the digit from a column of string. But str.extract(\\d) does not work for string of only numeric.

df['extract'] = df['original'].str.extract('(\d+)')

Please see the dataframe as dictionary:

{'original': {0: 'NO RATING',
  1: 4,
  2: '3-',
  3: 3,
  4: '4-',
  5: '2-',
  6: '2+',
  7: '4+',
  8: '5-',
  9: 5,
  10: '5+',
  11: 2,
  12: '3+',
  13: '6+',
  14: '6-',
  15: 6,
  16: 7},
 'extract': {0: nan,
  1: nan,
  2: '3',
  3: nan,
  4: '4',
  5: '2',
  6: '2',
  7: '4',
  8: '5',
  9: nan,
  10: '5',
  11: nan,
  12: '3',
  13: '6',
  14: '6',
  15: nan,
  16: nan}}

df is a pd dataframe with 2 columns, df['orginal'] contains values like 2+, 2-,2, 3-,3, 3+, NO RATING.

the code works generates new column df['extract'], which is correct for values like 2-(gives 2), 3+(gives 3), NO RATING(gives NaN). But it's wrong for values like 2(gives NaN, but I'm expecting 2) and 3(gives NaN, but I'm expecting 3).

my result

在使用extract之前,请确保您拥有所有字符串

df['extract'] = df['original'].astype(str).str.extract('(\d+)')

The problem is some of the values are integers while some are string. Although str.extract is not getting an error, it is not extracting the correct values if it is an integer. You can use lambda and findall functions to handle this case. Then also add an optional operator (+) to get more digits in case value is > 9.

df['extract'] = df['original'].map(lambda x: re.findall('(\d+)', str(x))) \
                           .map(lambda i: i[0] if len(i)>0 else None)

Result:

   original extract
0   5         5
1   13+      13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM