简体   繁体   English

在数据库中存储格式化内容的标准方法是什么?

[英]What's the standard way to store formatted content in the database?

I have an application that involves storing and retrieving lots of user-formatted content using a WYSIWYG html editor. 我有一个应用程序,涉及使用WYSIWYG html编辑器存储和检索大量用户格式的内容。 Kind of like how SO saves formatted questions and answers. 有点像SO如何保存格式化的问题和答案。

What's the standard approach to do this? 这样做的标准方法是什么?

EDIT: 编辑:

Just to clarify: I am not asking about the data type to store in the DB. 只是为了澄清:我不是在询问要存储在数据库中的数据类型。 Rather I am concerned about storing chunks of html tags with style information in the DB. 相反,我担心在数据库中存储带有样式信息的html标签块。

This is just text data. 这只是文本数据。 Usually a VARCHAR is best. 通常VARCHAR是最好的。

UPDATE: Yes, if you want to support Unicode (which you probably do in this case) then make that an NVARCHAR . 更新:是的,如果你想支持Unicode(在这种情况下你可能会这样做),那就把它变成一个NVARCHAR

As for the OPs update, you are imagining difficulties which don't really exist. 至于OP更新,您正在想象实际上并不存在的困难。 HTML is textual data so it goes into a text field. HTML是文本数据,因此它进入文本字段。 You do not want to separate the formatting from the text at all. 您根本不想将格式与文本分开。

That is the answer but it isn't the end of your concerns on this matter. 这就是答案,但这并不是你对这件事的关注的结束。 The reason doing this is bothering you is probably because databases use structured data (all of the data is in named and typed columns) and this is unstructured content. 这样做的原因是困扰你可能是因为数据库使用结构化数据(所有数据都在命名和类型列中),这是非结构化内容。 Meaning that the data in this field is not being stored in a DB friendly manner. 这意味着该字段中的数据不以DB友好的方式存储。 You should try to structure your data as much as possible because it allows you to quickly search by the field values. 您应该尝试尽可能多地构建数据,因为它允许您通过字段值快速搜索。 We are throwing anything the user types into that field and if we ever need to find data in that field we'll need to search the entire field to find it. 我们将用户输入的任何内容扔进该字段,如果我们需要在该字段中查找数据,我们需要搜索整个字段以找到它。 This is very slow process and to make things worse we aren't just searching through the text but also the formatting for that text. 这是一个非常缓慢的过程,为了使事情变得更糟,我们不仅要搜索文本,还要搜索该文本的格式。

This is all true and not good so we should avoid doing this as much as possible. 这都是正确的,不好的,所以我们应该尽可能避免这样做。 If you can avoid allowing users to enter free form text then do so by all means. 如果您可以避免允许用户输入自由格式文本,那么一定要这样做。 From that point you can apply HTML formatting to the data from your client application in a fast and consistent manner. 从那时起,您可以快速一致地将HTML格式应用于客户端应用程序中的数据。

However, the basis of this question is that you want a field of unstructured content and you are asking how to store that unstructured content. 但是,此问题的基础是您需要一个非结构化内容字段,并且您正在询问如何存储该非结构化内容。 That answer is pretty simple (even though I guess that I didn't get it 100% correct the first try), use NVARCHAR . 这个答案非常简单(即使我猜我第一次尝试没有100%正确),使用NVARCHAR

Even though storing this unstructured content is not DB friendly it is sometimes website friendly and a common practice in the situation you are describing. 即使存储这种非结构化内容不是数据库友好的,它有时也是网站友好的,并且是您所描述的情况下的常见做法。 The thing to remember is that we want to avoid searching on this unstructured data. 要记住的是,我们希望避免搜索这种非结构化数据。 We may need to go to fairly extreme measures to do so. 我们可能需要采取相当极端的措施来实现这一目标。

Many applications will solve this slow search problem by creating a separate table and parsing the text out of the HTML and inserting each individual word (along with the foreign key for the original tables entry) into that other table to be searched on later. 许多应用程序将通过创建一个单独的表并解析HTML中的文本并将每个单词(以及原始表条目的外键)插入到稍后要搜索的另一个表中来解决这个慢搜索问题。 Even if you do this you'll still want to keep your original formatted text for display purposes . 即使您这样做, 您仍然希望保留原始格式化文本以用于显示目的

I generally make this type of optimization Phase II because the site will function without such optimizations; 我通常会进行这种类型的优化Phase II因为该网站将在没有这种优化的情况下运行; it'll just be slower and that isn't going to even be noticed until the site has plenty of content to search through. 它会变得更慢,直到该网站有足够的内容来搜索才会被注意到。

One other thing to note is that often this will not be HTML formatted text. 另外需要注意的是,这通常不是HTML格式的文本。 There are several formats commonly used such as BBCode or Markdown. 有几种常用的格式,如BBCode或Markdown。 SQL doesn't care though, to your SQL server this is all just text. SQL并不关心,对于您的SQL服务器,这只是文本。

The title of the question could be stored in a VARCHAR and the question in a TEXT . 问题的标题可以存储在VARCHAR ,问题可以存储在TEXT

Here, have a look at the data types of the SQL Server: http://msdn.microsoft.com/en-us/library/ms187752.aspx 在这里,看看SQL Server的数据类型: http//msdn.microsoft.com/en-us/library/ms187752.aspx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM