[英]Writing list of objects to csv file
I am writing a python program that loops through reddit submissions, pulls data, and stores it as an object in a list.我正在编写一个 python 程序,它循环遍历 reddit 提交、提取数据并将其作为对象存储在列表中。 However I am having trouble writing that list to a csv file.
但是,我无法将该列表写入 csv 文件。 The file is created but it just gives some kind of id tag for the objects.
该文件已创建,但它只是为对象提供了某种 id 标签。 How should I change the csv code?
我应该如何更改 csv 代码?
Code代码
import praw
from datetime import datetime
import pandas as pd
class Submission:
def __init__(self, time, score, title, text, ofReddit, serious):
self.time = time
self.score = score
self.title = title
self.text = text
self.ofReddit = ofReddit
self.serious = serious
data = []
reddit = praw.Reddit(client_id=id, client_secret=secret,
user_agent='testscript by /u/SilentButtDeadlies')
subreddit = reddit.subreddit('AskReddit')
for submission in subreddit.new(limit=50):
time = datetime.utcfromtimestamp(submission.created_utc).hour
score = submission.score
title = len(submission.title)
text = len(submission.selftext)
if 'of reddit' in submission.title.lower():
ofReddit = 1
else:
ofReddit = 0
if '[serious]' in submission.title.lower():
serious = 1
else:
serious = 0
data.append(Submission(time, score, title, text, ofReddit, serious))
df = pd.DataFrame(data)
filename = 'AskRedditData' + str(datetime.now()) + '.csv'
df.to_csv(filename, index=False, encoding='utf-8')
CSV File CSV文件
0
<__main__.Submission instance at 0x1118f6ef0>
<__main__.Submission instance at 0x1118f68c0>
<__main__.Submission instance at 0x1118f6950>
<__main__.Submission instance at 0x1118c3758>
<__main__.Submission instance at 0x11239c638>
<__main__.Submission instance at 0x11239c5f0>
<__main__.Submission instance at 0x112398908>
<__main__.Submission instance at 0x112398998>
<__main__.Submission instance at 0x112398878>
<__main__.Submission instance at 0x1123989e0>
<__main__.Submission instance at 0x112398c68>
<__main__.Submission instance at 0x11239fe18>
<__main__.Submission instance at 0x11239fe60>
<__main__.Submission instance at 0x11239fea8>
<__main__.Submission instance at 0x11239fef0>
<__main__.Submission instance at 0x11239ff38>
<__main__.Submission instance at 0x11239ff80>
<__main__.Submission instance at 0x11239ffc8>
<__main__.Submission instance at 0x112404050>
<__main__.Submission instance at 0x112404098>
<__main__.Submission instance at 0x1124040e0>
<__main__.Submission instance at 0x112404128>
<__main__.Submission instance at 0x112404170>
<__main__.Submission instance at 0x1124041b8>
<__main__.Submission instance at 0x112404200>
<__main__.Submission instance at 0x112404248>
<__main__.Submission instance at 0x112404290>
<__main__.Submission instance at 0x1124042d8>
<__main__.Submission instance at 0x112404320>
<__main__.Submission instance at 0x112404368>
<__main__.Submission instance at 0x1124043b0>
<__main__.Submission instance at 0x1124043f8>
<__main__.Submission instance at 0x112404440>
<__main__.Submission instance at 0x112404488>
<__main__.Submission instance at 0x1124044d0>
<__main__.Submission instance at 0x112404518>
<__main__.Submission instance at 0x112404560>
<__main__.Submission instance at 0x1124045a8>
<__main__.Submission instance at 0x1124045f0>
<__main__.Submission instance at 0x112404638>
<__main__.Submission instance at 0x112404680>
<__main__.Submission instance at 0x1124046c8>
<__main__.Submission instance at 0x112404710>
<__main__.Submission instance at 0x112404758>
<__main__.Submission instance at 0x1124047a0>
<__main__.Submission instance at 0x1124047e8>
<__main__.Submission instance at 0x112404830>
<__main__.Submission instance at 0x112404878>
<__main__.Submission instance at 0x1124048c0>
<__main__.Submission instance at 0x112404908>
Your submission class seems to simply function as a record type.您的提交类似乎只是一种记录类型。 You probably could just use a
namedtuple
.您可能只使用
namedtuple
。 So replace you class definition with:因此,将您的类定义替换为:
from collections import namedtuple
Submission = namedtuple('Submission', ['time', 'score', 'title', 'text', 'ofReddit', 'serious'])
Now the rest of your code should just work.现在您的其余代码应该可以正常工作了。
pandas
doesn't know how to interpret your Submission
class you originally wrote. pandas
不知道如何解释您最初编写的Submission
类。 So it simply makes a single column of Submission
objects, and when it writes, it uses the str(Submission())
which defaults to the object
__str__
since you did not define another __str__
.因此,它只是生成一列
Submission
对象,当它写入时,它使用str(Submission())
默认为object
__str__
因为您没有定义另一个__str__
。 Really, you want to use a sequence.真的,你想使用一个序列。 The
namedtuple
function is actually a class factory , and it created a record-type derived from tuple
, so it has all the handy functions you need with a very handy constructor. namedtuple
函数实际上是一个类工厂,它创建了一个从tuple
派生的记录类型,因此它具有您需要的所有方便的函数以及一个非常方便的构造函数。
Now, since you are using Python 2, I didn't bother to change your use of pandas
, even though it seems like overkill to only use it for writing a csv.现在,由于您使用的是 Python 2,我没有费心改变您对
pandas
的使用,尽管仅将其用于编写 csv 似乎有点过分。 That being said, getting Python 2 csv module to play nice with unicode is a pain, so you might as well keep it.话虽如此,让 Python 2 csv 模块与 unicode 一起玩是一种痛苦,所以你最好保留它。 If you could switch to Python 3 , you could simply replace the
pandas
stuff with:如果您可以切换到Python 3 ,您可以简单地将
pandas
内容替换为:
import csv
with open(filename, 'w', newline='', encoding='utf8') as f:
writer = csv.writer(f)
writer.writerow(Submission._fields) # namedtuple breaks convention public fields have single underscore
writer.writerows(data)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.