简体   繁体   English

存储评论数据的最快方法python

[英]fastest way to store comment data python

Hi I have a small comment shoutbox type cgi process running on a server and currently when someone leaves a comment I simply format that comment into html ie 嗨,我有一个小的注释shoutbox类型的cgi进程正在服务器上运行,当前当有人留下注释时,我只是将该注释格式化为html,即

<p class="title">$title</p> <p class="comment">$comment</p>

and store in a flat file. 并存储在平面文件中。 Would it be faster and acceptably low in LOC to reimplement the storage in xml or json, in a simple spec of my own or stick with the simple html route?. 以我自己的简单规范还是坚持使用简单的html路由,以xml或json重新实现存储,在LOC中更快,可接受的程度更低吗?

I don't want to use relational database for this. 我不想为此使用关系数据库。

If a flat file is fast enough, then go with that, since it's very simple and accessible. 如果平面文件足够快,则可以使用它,因为它非常简单且易于访问。 Storing as XML and JSON but still using a flat file probably is very comparable in performance. 以XML和JSON格式存储但仍使用平面文件可能在性能上具有可比性。

You might want to consider (ignore this if you just left it out of your question) sanitizing/filtering the text, so that users can't break your HTML by eg entering "</p>" in the comment text. 您可能需要考虑对文本进行清理/过滤(如果只是将其排除在外,则忽略它),以使用户无法通过在注释文本中输入“ </ p>”来破坏HTML。

XML is nice, clean way to store this type of data. XML是一种存储此类数据的好方法。 In Python, you could use lxml to create/update the file: 在Python中,您可以使用lxml创建/更新文件:

from lxml import etree

P_XML = 'xml_file_path.xml'

def save_comment(title_text, comment_text):
  comment = etree.Element('comment')
  title = etree.SubElement(comment, 'title')
  title.text = title_text
  comment.text = comment_text
  f = open(P_XML, 'a')
  f.write(etree.tostring(comment, pretty_print=True))
  f.close()

save_comment("FIRST!", "FIRST COMMENT!!!")
save_comment("Awesome", "I love this site!")

That's a simple start, but you could do a lot more (ie set up an ID for each comment, read in the XML using lxml parser and add to it instead of just appending the file). 这是一个简单的开始,但是您可以做更多的事情(例如,为每个注释设置一个ID,使用lxml解析器读取XML并将其添加到其中,而不仅仅是添加文件)。

A flat-file is the fastest form of persistence. 平面文件是最快的持久性形式。 Period. 期。 There's no formatting, encoding, indexing, locking, or anything. 没有格式,编码,索引,锁定或其他任何内容。

JSON (and YAML) impose some overheads. JSON(和YAML)会带来一些开销。 They will be slower. 他们会慢一些。 There's some formatting that must be done. 必须完成一些格式化。

XML imposes more overheads than JSON/YAML. XML比JSON / YAML承担更多的开销。 It will be slower still. 还是会慢一些。 There's a fair amount of formatting that must be done. 必须进行大量格式化。

The more overhead, the slower it will be. 开销越大,则速度越慢。

None of these have anything to do with sanitizing the comment input so that it will display as valid HTML. 这些都与清除注释输入无关,以便将其显示为有效的HTML。 You should use cgi.escape to escape any HTML-like character sequences in the comment before saving the text to a file. 在将文本保存到文件之前,应使用cgi.escape来转义注释中所有类似HTML的字符序列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM