I need to extract the digit from a column of string. But str.extract(\\d) does not work for string of only numeric.
df['extract'] = df['original'].str.extract('(\d+)')
Please see the dataframe as dictionary:
{'original': {0: 'NO RATING',
1: 4,
2: '3-',
3: 3,
4: '4-',
5: '2-',
6: '2+',
7: '4+',
8: '5-',
9: 5,
10: '5+',
11: 2,
12: '3+',
13: '6+',
14: '6-',
15: 6,
16: 7},
'extract': {0: nan,
1: nan,
2: '3',
3: nan,
4: '4',
5: '2',
6: '2',
7: '4',
8: '5',
9: nan,
10: '5',
11: nan,
12: '3',
13: '6',
14: '6',
15: nan,
16: nan}}
df is a pd dataframe with 2 columns, df['orginal'] contains values like 2+, 2-,2, 3-,3, 3+, NO RATING.
the code works generates new column df['extract'], which is correct for values like 2-(gives 2), 3+(gives 3), NO RATING(gives NaN). But it's wrong for values like 2(gives NaN, but I'm expecting 2) and 3(gives NaN, but I'm expecting 3).
在使用extract
之前,请确保您拥有所有字符串
df['extract'] = df['original'].astype(str).str.extract('(\d+)')
The problem is some of the values are integers while some are string. Although str.extract is not getting an error, it is not extracting the correct values if it is an integer. You can use lambda and findall functions to handle this case. Then also add an optional operator (+) to get more digits in case value is > 9.
df['extract'] = df['original'].map(lambda x: re.findall('(\d+)', str(x))) \
.map(lambda i: i[0] if len(i)>0 else None)
Result:
original extract
0 5 5
1 13+ 13
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.