简体   繁体   English

如何从包含 json 的文件创建新的 pandas dataframe 列?

[英]How to create new pandas dataframe column from file that contains json?

I have a dataset with metadata about voice calls我有一个包含语音通话元数据的数据集

It looks like看起来像

Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   phone        100 non-null    string
 1   group_id     100 non-null    int64 
 2   question_id  100 non-null    int64 
 3   result       100 non-null    bool 

I want to create column #4 that will contain some dialogue statistics, eg total_words (int64) , and data must be taken from exteral json file, containing speech-to-text recognition results我想创建第 4 列,其中将包含一些对话统计信息,例如total_words (int64) ,数据必须取自外部 json 文件,其中包含语音到文本的识别结果

Is there any build-in pandas way to do that?有没有内置的 pandas 方法可以做到这一点?

I've tested with pandas.read_json but getting module errors ( ValueError: DataFrame constructor not properly called! and TypeError: argument of type 'method' is not iterable )我已经使用pandas.read_json进行了测试,但出现了模块错误( ValueError: DataFrame constructor not properly called! and TypeError: argument of type 'method' is not iterable

I'm looking for something like我正在寻找类似的东西

df['total_words'] = pd.read_json('file://localhost:8888/auido/' + df['phone'] + '.mp3.wstat.json')

I would be glad if someone will provide a working code example of solving similar issue如果有人能提供解决类似问题的工作代码示例,我会很高兴

UPD: output of json file is like {"total_words": 74} UPD:json 文件的 output 就像{"total_words": 74}

Try this:试试这个:

df['total_words']=df['phone'].apply(lambda x: pd.read_json('file://localhost:8888/auido/' + x + '.mp3.wstat.json'))

Now each cell of total_words column contains another dataframe, you can access it using:现在total_words列的每个单元格都包含另一个 dataframe,您可以使用以下方式访问它:

#for first row
df.iloc[0]["total_words"].head()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM