[英]Inserting documents in MongoDB using PyMongo in Python
I am inserting documents in MongoDB using the PyMongo library in Python. 我正在使用Python中的PyMongo库在MongoDB中插入文档。 The pandas dataframe has 37 fields and 60k records (link to dataset: https://drive.google.com/open?id=119T4uhvHc7CAwJgZRselWXpstAQhkj90 ).
大熊猫数据框具有37个字段和6万条记录(链接到数据集: https ://drive.google.com/open?id =119T4uhvHc7CAwJgZRselWXpstAQhkj90 )。 All fields in the dataframe have been converted to
str
type. 数据框中的所有字段均已转换为
str
类型。 I am getting the following error: 我收到以下错误:
OverflowError: MongoDB can only handle up to 8-byte ints
The error still persists when I insert chunks of 2500 documents using a for loop. 当我使用for循环插入2500个文档块时,错误仍然存在。
Code snippet: 程式码片段:
import pandas as pd
import pymongo
client = pymongo.MongoClient()
db = client['patenting_in_psi']
collection = db['sample5']
df=pd.read_excel(r"C:\Users\mazin\1-601.xlsx")
collection.insert_many((df.to_dict('records')))
Some fields with missing data need to be normalized before converting the dataframe to a dictionary. 在将数据帧转换为字典之前,需要对一些缺少数据的字段进行规范化。
Values for DWPIAccessionNumber
have to be normalized. DWPIAccessionNumber
值必须标准化。 For example, in record number 2524
is a 64-bit integer whose value is 20100000000000001078890512051682672902079220850980264522702989250781417512482524046970942628331980756243236345447307144055181790035144112662138043858072629370129477827567049201927634798584141270252235498775249725404749823022689297835494826055102466304887343437187655164225642338109880434082104977849399115776
. 例如,记录号
2524
是一个64位整数,其值是20100000000000001078890512051682672902079220850980264522702989250781417512482524046970942628331980756243236345447307144055181790035144112662138043858072629370129477827567049201927634798584141270252235498775249725404749823022689297835494826055102466304887343437187655164225642338109880434082104977849399115776
This may be converted to a bson.int64.Int64
type or conveniently typed as str
( there are instances where this value is a str
- see record number 23 or a nan
. ) 这可转换成
bson.int64.Int64
类型或方便地分类为str
( 有些情况,其中该值是一个str
-见记录号23或nan
)。
df['DWPIAccessionNumber'] = df['DWPIAccessionNumber'].astype(str)
Also PublicationDate
field needs to be normalized as well. 另外,
PublicationDate
字段也需要规范化。 For example, in record number 24696
its value is missing. 例如,在记录号
24696
其值丢失。 You either drop the field, set some date or fill it a zero. 您可以删除该字段,设置某个日期或将其填充为零。
df['PublicationDate'].fillna(0, inplace=True)
Now, your data is ready to be converted to a dictionary then inserted. 现在,您的数据已准备好转换为字典,然后插入。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.