Pandas.series str extract does not get string of one digit

Question

I need to extract the digit from a column of string. But str.extract(\\d) does not work for string of only numeric.

df['extract'] = df['original'].str.extract('(\d+)')

Please see the dataframe as dictionary:

{'original': {0: 'NO RATING',
  1: 4,
  2: '3-',
  3: 3,
  4: '4-',
  5: '2-',
  6: '2+',
  7: '4+',
  8: '5-',
  9: 5,
  10: '5+',
  11: 2,
  12: '3+',
  13: '6+',
  14: '6-',
  15: 6,
  16: 7},
 'extract': {0: nan,
  1: nan,
  2: '3',
  3: nan,
  4: '4',
  5: '2',
  6: '2',
  7: '4',
  8: '5',
  9: nan,
  10: '5',
  11: nan,
  12: '3',
  13: '6',
  14: '6',
  15: nan,
  16: nan}}

df is a pd dataframe with 2 columns, df['orginal'] contains values like 2+, 2-,2, 3-,3, 3+, NO RATING.

the code works generates new column df['extract'], which is correct for values like 2-(gives 2), 3+(gives 3), NO RATING(gives NaN). But it's wrong for values like 2(gives NaN, but I'm expecting 2) and 3(gives NaN, but I'm expecting 3).

my result

Answer 1

在使用extract之前，请确保您拥有所有字符串

df['extract'] = df['original'].astype(str).str.extract('(\d+)')

Answer 2

The problem is some of the values are integers while some are string. Although str.extract is not getting an error, it is not extracting the correct values if it is an integer. You can use lambda and findall functions to handle this case. Then also add an optional operator (+) to get more digits in case value is > 9.

df['extract'] = df['original'].map(lambda x: re.findall('(\d+)', str(x))) \
                           .map(lambda i: i[0] if len(i)>0 else None)

Result:

   original extract
0   5         5
1   13+      13

Pandas.series str extract does not get string of one digit

Question

2 answers

solution1
1 ACCPTED 2019-04-24 18:26:31

solution2
0 2019-04-24 18:02:21

Pandas.series str extract does not get string of one digit

Question

2 answers

solution1 1 ACCPTED 2019-04-24 18:26:31

solution2 0 2019-04-24 18:02:21

solution1
1 ACCPTED 2019-04-24 18:26:31

solution2
0 2019-04-24 18:02:21