I am trying to understand why str.extract"([\\d+%])"
returns NaN
while str.count"([\\d+%])"
returns the correct answer when parsing text within a column of a dataframe.
For example,
df = pd.DataFrame({'Subject':['3 hrs only! 35% off', 'Secret Savings!', 'Sale: 40% off']})
pattern = re.compile(r"(\d+%)")
df['Discount'] = df['Subject'].str.count(pattern)
...yields a Discount column with "1's" in row 1 and 3 as you would expect. However,
df['Discount'] = df['Subject'].str.extract(pattern)
...returns NaNs instead. I cannot understand why count can parse the percentages but extract does not. This is driving me a little crazy as it seems like it should be straightforward.
The bug was fixed in the subsequent Pandas version.
Now, with Pandas 0.24.2, you may use
>>> df.index=['a', 'b', 'c']
>>> df
Subject Discount
a 3 hrs only! 35% off 35%
b Secret Savings! NaN
c Sale: 40% off 40%
>>> df['Subject'].str.extract(pattern)
0
a 35%
b NaN
c 40%
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.