简体   繁体   中英

rpy2: How to properly convert NaN to NA from panda.DataFrame to R?

I'm trying to pass dataframe from python to R using rpy2 library. I've tried the following method but failed.

  • do not do anything - obviously it is NaN in python and result in run time error.
  • fillna('NA') - it is character instead of NA when interpreted by R.
  • fillna(robjects.NA_Logical) - NaN will be turn to 0 instead and no imputation will be done.
  • fillna(robjects.NA_real) - remain as NaN and result in run time error.
  • converting it to r dataframe before passing into the method using pandas2ri.py2ri() - same run time error.
import pandas as pd
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pcaMethods = importr('pcaMethods')
pandas2ri.activate()

train_df = pd.read_csv('C:\\misc\\train.csv')
train_dfNA = train_df.fillna(robjects.NA_Real)
result = pcaMethods.pca(train_dfNA, method="svd", nPcs=2)

I wish to do conversion and any other stuff in python while R is only used for executing the methods I need (rarely but still needed). Alternatively I know I can just send R codes through robjects.r to perform the task but that is another topic of discussion.

Edit: The error I get is the following

rpy2.rinterface.RRuntimeError: Error in (function (object, method, nPcs = 2, scale = c("none", "pareto",  : Invalid data format.Run checkData(data, verbose=TRUE) for details

In R, different "NA" values are defined for different array types. Type casting, and the type hierarchy, make it "just work" when ones does something like array[i] <- NA .

Here are some of the NA values in R and how they are mapped in rpy2 :

import rpy2.robjects as ro
print('%12s   %12s   %20s' % ('rpy2 name', 'R', 'rpy2 type'))
for s in ('NA_Character', 'NA_Real', 'NA_Integer', 'NA_Logical'): 
    r = getattr(ro, s) 
    print('%12s   %12s   %20s' % (s, r, r.typeof if hasattr(r, 'typeof')
                                  else 'Native Python object')) 

You should get something like:

rpy2 name      R                rpy2 type                                                                           
NA_Character   NA_character_    RTYPES.CHARSXP
NA_Real        NA_real_         Native Python object
NA_Integer     NA_integer_      Native Python object
NA_Logical     NA               Native Python object

Here you are using pandas to set your NA values though, so the resulting type from calling fillna() will be what matters when passed to R.

Regarding the runtime error, you are not sharing it but I suspect that this is the R function called reporting that missing values make a PCA impossible to perform.

Finally, I am seeing from the file paths that you are using rpy2 on Windows. Unfortunately using rpy2 on Windows ranges "not supported" to "won't even install" depending on the versions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM