简体   繁体   中英

Python Pandas Dataframe interpolation with strings

I was wondering if Pandas Dataframe allows for interpolation for strings as well. (I have values working but not for strings).

 import pandas as pd import numpy as np s = pd.Series(["Blue", "Blue", np.nan, "Blue","Blue","Red"]) s = s.interpolate() print(s)
Output: Blue, Blue, NaN, Blue, Blue, Red

Desired Output: Blue, Blue, Blue, Blue, Blue, Red

只需使用填充。

s = s.ffill()

no, you can't interpolate strings, but, it is possible to convert the strings to categories and then interpolate on that.

arr, cat = s.factorize()
s2 = pd.Series(arr).replace(-1, np.nan).interpolate()\
         .astype('category').cat.rename_categories(cat)\
         .astype('str')

In your case s.interpolate(method='pad') or s.ffill() will do just fine but you may compare and observe outputs of different techniques below:

import pandas as pd

s = pd.Series([None, None, 'red', 'red', None, 'blue', None, None])

print(s.to_list())
print(s.bfill().tolist())
print(s.ffill().tolist())
print(s.bfill().ffill().tolist())
print(s.ffill().bfill().tolist())
print(s.interpolate(method='pad').tolist())

Output:

[None, None, 'red', 'red', None, 'blue', None, None]
['red', 'red', 'red', 'red', 'blue', 'blue', None, None]
[None, None, 'red', 'red', 'red', 'blue', 'blue', 'blue']
['red', 'red', 'red', 'red', 'blue', 'blue', 'blue', 'blue']
['red', 'red', 'red', 'red', 'red', 'blue', 'blue', 'blue']
[None, None, 'red', 'red', 'red', 'blue', 'blue', 'blue']

I believe that the following will also work for strings:

s = s.interpolate(method='pad')

See the documentation at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.interpolate.html .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM