简体   繁体   English

在Python中使用PyMongo在MongoDB中插入文档

[英]Inserting documents in MongoDB using PyMongo in Python

I am inserting documents in MongoDB using the PyMongo library in Python. 我正在使用Python中的PyMongo库在MongoDB中插入文档。 The pandas dataframe has 37 fields and 60k records (link to dataset: https://drive.google.com/open?id=119T4uhvHc7CAwJgZRselWXpstAQhkj90 ). 大熊猫数据框具有37个字段和6万条记录(链接到数据集: https ://drive.google.com/open?id =119T4uhvHc7CAwJgZRselWXpstAQhkj90 )。 All fields in the dataframe have been converted to str type. 数据框中的所有字段均已转换为str类型。 I am getting the following error: 我收到以下错误:

OverflowError: MongoDB can only handle up to 8-byte ints

The error still persists when I insert chunks of 2500 documents using a for loop. 当我使用for循环插入2500个文档块时,错误仍然存​​在。

Code snippet: 程式码片段:

import pandas as pd
import pymongo

client = pymongo.MongoClient()
db = client['patenting_in_psi']
collection = db['sample5']

df=pd.read_excel(r"C:\Users\mazin\1-601.xlsx")

collection.insert_many((df.to_dict('records')))

Some fields with missing data need to be normalized before converting the dataframe to a dictionary. 在将数据帧转换为字典之前,需要对一些缺少数据的字段进行规范化。

Values for DWPIAccessionNumber have to be normalized. DWPIAccessionNumber值必须标准化。 For example, in record number 2524 is a 64-bit integer whose value is 20100000000000001078890512051682672902079220850980264522702989250781417512482524046970942628331980756243236345447307144055181790035144112662138043858072629370129477827567049201927634798584141270252235498775249725404749823022689297835494826055102466304887343437187655164225642338109880434082104977849399115776 . 例如,记录号2524是一个64位整数,其值是20100000000000001078890512051682672902079220850980264522702989250781417512482524046970942628331980756243236345447307144055181790035144112662138043858072629370129477827567049201927634798584141270252235498775249725404749823022689297835494826055102466304887343437187655164225642338109880434082104977849399115776 This may be converted to a bson.int64.Int64 type or conveniently typed as str ( there are instances where this value is a str - see record number 23 or a nan . ) 这可转换成bson.int64.Int64类型或方便地分类为str有些情况,其中该值是一个str -见记录号23或nan )。

df['DWPIAccessionNumber'] = df['DWPIAccessionNumber'].astype(str)

Also PublicationDate field needs to be normalized as well. 另外, PublicationDate字段也需要规范化。 For example, in record number 24696 its value is missing. 例如,在记录号24696其值丢失。 You either drop the field, set some date or fill it a zero. 您可以删除该字段,设置某个日期或将其填充为零。

df['PublicationDate'].fillna(0, inplace=True)

Now, your data is ready to be converted to a dictionary then inserted. 现在,您的数据已准备好转换为字典,然后插入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM