简体   繁体   中英

How to combine strings in one DataFrame

I am processing inbound user data. I receive DataFrame h that is supposed to contain all float but has some strings:

>>> h = pd.DataFrame(np.random.rand(3, 2), columns=['a', 'b'])
>>> h.loc[0, 'a'] = 'bad'
>>> h.loc[1, 'b'] = 'robot'
>>> h
           a           b
0        bad    0.747314
1   0.921919       robot
2   0.754256    0.664455

I process and set the strings to np.nan (I realize np.nan is a float but this is to illustrate):

>>> hh = h.copy()
>>> hh.loc[0, 'a'] = np.nan
>>> hh.loc[1, 'b'] = np.nan
>>> hh
           a           b
0        NaN    0.747314
1   0.921919         NaN
2   0.754256    0.664455

I have a DataFrame with expected values (or a dict ):

>>> g = pd.DataFrame({'a': ['foo'], 'b': ['bar']}, index=h.index)
>>> g
      a       b
0   foo     bar
1   foo     bar
2   foo     bar

Which I use to fill where the bad data is.

>>> hh.fillna(g)
          a           b
0        foo    0.747314
1   0.921919         bar
2   0.754256    0.664455

I need to include the expected data too. So the result should be:

>>> magic(hh, g)
                   a                     b
0   rec=bad; exp=foo              0.747314
1           0.921919    rec=robot; exp=bar
2           0.754256              0.664455

How can I create such a result?

You can convert non necessary values to NaN s by DataFrame.where , join together with string s and last replace original values:

m = hh.isna()
df = ('rec=' + h.where(m) + '; exp=' + g.where(m)).fillna(h)
print (df)
                  a                   b
0  rec=bad; exp=foo            0.440508
1          0.525949  rec=robot; exp=bar
2          0.337586            0.414336

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM