I was wondering if Pandas Dataframe allows for interpolation for strings as well. (I have values working but not for strings).
import pandas as pd import numpy as np s = pd.Series(["Blue", "Blue", np.nan, "Blue","Blue","Red"]) s = s.interpolate() print(s)
Desired Output: Blue, Blue, Blue, Blue, Blue, Red
只需使用填充。
s = s.ffill()
no, you can't interpolate strings, but, it is possible to convert the strings to categories and then interpolate on that.
arr, cat = s.factorize()
s2 = pd.Series(arr).replace(-1, np.nan).interpolate()\
.astype('category').cat.rename_categories(cat)\
.astype('str')
In your case s.interpolate(method='pad')
or s.ffill()
will do just fine but you may compare and observe outputs of different techniques below:
import pandas as pd
s = pd.Series([None, None, 'red', 'red', None, 'blue', None, None])
print(s.to_list())
print(s.bfill().tolist())
print(s.ffill().tolist())
print(s.bfill().ffill().tolist())
print(s.ffill().bfill().tolist())
print(s.interpolate(method='pad').tolist())
Output:
[None, None, 'red', 'red', None, 'blue', None, None]
['red', 'red', 'red', 'red', 'blue', 'blue', None, None]
[None, None, 'red', 'red', 'red', 'blue', 'blue', 'blue']
['red', 'red', 'red', 'red', 'blue', 'blue', 'blue', 'blue']
['red', 'red', 'red', 'red', 'red', 'blue', 'blue', 'blue']
[None, None, 'red', 'red', 'red', 'blue', 'blue', 'blue']
I believe that the following will also work for strings:
s = s.interpolate(method='pad')
See the documentation at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.interpolate.html .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.