将R数据框加载到Python中并转换为Pandas数据框

Question

I am trying to run the following code in an R data frame using Python. 我正在尝试使用Python在R数据框中运行以下代码。

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import os
import pandas as pd
import timeit
from rpy2.robjects import r
from rpy2.robjects import pandas2ri
pandas2ri.activate()

start = timeit.default_timer()

def f(x):
    return fuzz.partial_ratio(str(x["sig1"]),str(x["sig2"]))

def fu_match(file):
    f1=r.load(file)
    f1=pandas2ri.ri2py(f1)
    f1["partial_ratio"]=f1.apply(f, axis=1)
    f1=f1.loc[f1["partial_ratio"]>90]
    f1.to_csv("test.csv")

stop = timeit.default_timer()
print stop - start 

fu_match('test_full.RData')

Here is the error. 这是错误。

AttributeError: 'numpy.ndarray' object has no attribute 'apply'

I guess the problem has to do with the conversion from R to Pandas data frame. 我想这个问题与从R到Pandas数据帧的转换有关。 I know this is a repeated question, but I have tried all the solutions given to previous questions with no success. 我知道这是一个重复的问题，但是我尝试了所有针对先前问题的解决方案，但均未成功。

Please, any help will be much appreciated. 请任何帮助将不胜感激。

EDIT: Here is the head of .RData. 编辑：这是.RData的头。

  city                         sig1                         sig2
1    19 claudiopillonrobertoscolari  almeidabartolomeufrancisco
2    19 claudiopillonrobertoscolari cruzricardosantasergiosilva
3    19 claudiopillonrobertoscolari             costajorgesilva
4    19 claudiopillonrobertoscolari    costafrancisconaifesilva
5    19 claudiopillonrobertoscolari          camarajoseluizreis
6    19 claudiopillonrobertoscolari    almeidafilhojoaopimentel

Answer 1

This line 这条线

f1=pandas2ri.ri2py(f1)

is setting f1 to be a numpy.ndarray when I think you expect it to be a pandas.DataFrame . 当我认为您希望它是numpy.ndarray时，将f1设置为pandas.DataFrame 。

You can cast the array into a DataFrame with something like 您可以使用以下方法将数组转换为DataFrame

f1 = pd.DataFrame(data=f1)

but you won't have your column names defined (which you use in f(x) ). 但是您不会定义列名（在f(x) ）。 What is the structure of test_full.RData ? test_full.RData的结构是test_full.RData ？ Do you want to manually define your column names? 您是否要手动定义列名？ If so 如果是这样的话

f1 = pd.DataFrame(data=f1, columns=("my", "column", "names"))

should do the trick. 应该可以。

BUT I would suggest you look at using a more standard data format, maybe .csv . 但我建议您使用更标准的数据格式，例如.csv 。 pandas has good support for this, and I expect R does too. pandas对此有很好的支持，我希望R也能做到。 Check out the docs . 检查出的文档。

将R数据框加载到Python中并转换为Pandas数据框

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-06-24 10:54:27

将R数据框加载到Python中并转换为Pandas数据框

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-06-24 10:54:27

解决方案1
3 已采纳 2015-06-24 10:54:27