[英]How to extract data from Microsoft Content Management Server (MCMS) database
I need to extract a sizeable amount of data (> 1000 pages) from a Microsoft Content Management Server (MCMS) database for use in a Sitecore website.我需要从 Microsoft 内容管理服务器 (MCMS) 数据库中提取大量数据(> 1000 页)以用于 Sitecore 网站。
I can see two main options:我可以看到两个主要选项:
Migrate the data into a new simplified database and display that information in the new website.将数据迁移到新的简化数据库中,并在新网站中显示该信息。
Convert the MCMS solution to SharePoint and use the SharePoint connector module available for Sitecore to display this information.将 MCMS 解决方案转换为 SharePoint 并使用可供 Sitecore 使用的 SharePoint 连接器模块来显示此信息。
I would prefer to go down the first route as there are no plans to use SharePoint to manage data/content in the future and would prefer to store this information in a simple SQL Server database to allow better searching.我更愿意在第一条路线上使用 go,因为没有计划在未来使用 SharePoint 来管理数据/内容,并且更愿意将此信息存储在简单的 Z9778840A0100CB30C982876741B0 数据库中以允许更好地搜索服务器。
I've looked at the database in question and think that the main tables I'd be interested in are Node
, NodePlaceholder
and NodePlaceholderContent
but am struggling to find what I would expect.我查看了有问题的数据库,并认为我感兴趣的主要表是Node
、 NodePlaceholder
和NodePlaceholderContent
,但我正在努力寻找我所期望的。 Can anyone out there give a bit of an explanation about the schema of this database for me?有人可以为我解释一下这个数据库的架构吗? Or am I going to have problems trying to migrate the data in this way?或者我会在尝试以这种方式迁移数据时遇到问题吗?
I've just recently been going through a similar process of exporting content pages out of MCMS 2002 (migrating to Wordpress).我最近刚刚经历了从 MCMS 2002 中导出内容页面的类似过程(迁移到 Wordpress)。
I'm not saying this is the 100% correct way to get the data but it worked for me.我并不是说这是获取数据的 100% 正确方法,但它对我有用。
Here's the process I've taken to get page content out of the database.这是我从数据库中获取页面内容的过程。
As you've already seen the tables storing most of the data are Node
and NodePlaceholderContent
正如您已经看到的,存储大部分数据的表是Node
和NodePlaceholderContent
1.) To get an idea of what the Node
table holds you can view the contents organized by type 1.) 要了解Node
表包含的内容,您可以查看按类型组织的内容
SELECT
[Type]
,CASE [Type]
WHEN 1 THEN 'Server'
WHEN 4 THEN 'Channel'
WHEN 16 THEN 'Post/Page'
WHEN 64 THEN 'Resource Gallery'
WHEN 256 THEN 'Resource Gallery Item (images/documents)'
WHEN 16384 THEN 'Template Gallery'
WHEN 65536 THEN 'Template' END as [Description]
,COUNT([Type]) as [Count]
FROM dbo.Node
GROUP BY [Type]
ORDER BY [Count] DESC
2.) Pages (and Posts, will cover Posts further down) are type = 16...but to get just pages (and not posts) we need to filter by IsShortcut = 0
2.) 页面(和帖子,将进一步覆盖帖子)的类型 = 16...但是要仅获取页面(而不是帖子),我们需要按IsShortcut = 0
进行过滤
SELECT * FROM dbo.Node WHERE [Type] = 16 AND IsShortcut = 0
3.) I only wanted published pages, so filter by ApprovalStatus = 1
3.) 我只想要已发布的页面,所以按ApprovalStatus = 1
过滤
-- Get all published pages
SELECT *
FROM dbo.Node WHERE [Type] = 16
AND IsShortcut = 0
AND ApprovalStatus = 1
4.) Next, determine page created/modified by (with usernames) 4.) 接下来,确定由(使用用户名)创建/修改的页面
-- Get published pages & author/editor
SELECT
[page].Id
,[page].NodeGuid
,[page].Name
,[created].Username as 'CreatedBy'
,[page].CreatedWhen
,[modified].Username as 'ModifiedBy'
,[page].ModifiedWhen
FROM dbo.Node [page]
-- add JOIN on created by user
INNER JOIN dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
-- add JOIN on modified by user
INNER JOIN dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
WHERE [Type] = 16
AND IsShortcut = 0
AND ApprovalStatus = 1
5.) Next, figure out where in the hierarchy we are by using the Node.ParentGUID
column 5.) 接下来,使用Node.ParentGUID
列找出我们在层次结构中的位置
SELECT
[page].Id
,[page].NodeGuid
,[page].Name
,[pageParent].Name -- add page parent Name
,[created].Username as 'CreatedBy'
,[page].CreatedWhen
,[modified].Username as 'ModifiedBy'
,[page].ModifiedWhen
FROM dbo.Node [page]
INNER JOIN dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
-- add JOIN on Node using ParentGUID
INNER JOIN dbo.Node [pageParent] ON [pageParent].NodeGUID = [page].ParentGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1
This query let me know that pages are either in parent nodes named Folders
or Archive Folder
此查询让我知道页面位于名为“ Folders
”或“ Archive Folder
”的父节点中
6.) Go up another level (get parent of parent) 6.) Go 再上一层(获取父级的父级)
SELECT
[page].Id
,[page].NodeGuid
,[page].Name
,[pageParent].Name
,[pageParent2].Name -- add parent of parent name
,[created].Username as 'CreatedBy'
,[page].CreatedWhen
,[modified].Username as 'ModifiedBy'
,[page].ModifiedWhen
FROM dbo.Node [page]
INNER JOIN dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
INNER JOIN dbo.Node [pageParent] ON [pageParent].NodeGUID = [page].ParentGUID
-- add another JOIN on Node using ParentGUID (parent of parent)
INNER JOIN dbo.Node [pageParent2] ON [pageParent2].NodeGUID = [pageParent].ParentGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1
The parent of parent is Server
(the root level) so now my conclusion is if the page's parent is:父级的父级是Server
(根级别)所以现在我的结论是如果页面的父级是:
Folders
- then that's an active page Folders
- 那么这是一个活动页面Archive Folder
- then that's a previous revision of another page Archive Folder
- 那是另一个页面的先前版本I only want active pages so I'm going to JOIN on the Folders
parent only我只想要活动页面,所以我只在Folders
父级上加入
7.) Now how about the markup. 7.) 现在如何标记。 In our MCMS template there was only had one placeholder area.在我们的 MCMS 模板中,只有一个占位符区域。 The NodePlaceholder
table will identify the name of the placeholder(s) which is helpful if you have multiple placeholder areas in your template. NodePlaceholder
表将识别占位符的名称,如果您的模板中有多个占位符区域,这将很有帮助。 I'm only going to join on NodePlaceholdercontent
for simplicity.为简单起见,我只会加入NodePlaceholdercontent
。
SELECT
[page].Id
,[page].NodeGuid
,[page].Name
/* remove parent names */
,[created].Username as 'CreatedBy'
,[page].CreatedWhen
,[modified].Username as 'ModifiedBy'
,[page].ModifiedWhen
,html.PropValue as 'HTML' -- add the markup
FROM dbo.Node [page]
INNER JOIN dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
-- change alias to "folders"
INNER JOIN dbo.Node [folders] ON [folders].NodeGUID = [page].ParentGUID AND [folders].Name = 'Folders'
-- join on PlaceholderContent to get the HTML
-- this table will also have references to any static files contained in the page (such as images) so we filter those out by PropName = 'HTML'
INNER JOIN dbo.NodePlaceholderContent html ON html.NodeId = [page].Id AND html.PropName = 'HTML'
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1
8.) So at this point I got a little stuck on trying to determine where the page is in the system (ie. relative path or what channel does it live in), going back to step 1 & 2, type = 16 can be either a post or a page (which aren't the same thing but they are related). 8.)所以在这一点上,我有点卡在试图确定页面在系统中的位置(即相对路径或它所在的频道),回到第 1 步和第 2 步,type = 16 可以是帖子或页面(它们不是同一件事,但它们是相关的)。 So now we JOIN our pages to the post records to determine pathing.所以现在我们将我们的页面加入到帖子记录中以确定路径。
After some google searches I stumbled upon this excerpt from Microsoft Content Management Server 2002: a complete guide really helped to get the rest of the way (and identified the Node.Type
enums)经过一些谷歌搜索后,我偶然发现了Microsoft Content Management Server 2002 的这段摘录:一个完整的指南确实有助于获得 rest 的方式(并确定了Node.Type
枚举)
SELECT
[page].Id
,[page].NodeGuid
,[page].Name
,[post].DisplayName as 'Title' -- add page Title from the post record
,[pageParent].Name
,[pageParent2].Name
,[created].Username as 'CreatedBy'
,[page].CreatedWhen
,[modified].Username as 'ModifiedBy'
,[page].ModifiedWhen
,html.PropValue as 'HTML'
FROM dbo.Node [page]
INNER JOIN dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
INNER JOIN dbo.Node [folders] ON [folders].NodeGUID = [page].ParentGUID AND [folders].Name = 'Folders'
INNER JOIN dbo.NodePlaceholderContent html ON html.NodeId = [page].Id AND html.PropName = 'HTML'
-- join using followGUID to get the posting
INNER JOIN dbo.Node [post] ON [post].FollowGUID = [page].NodeGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1
9.) The final step now is to keep going up the post parent hierarchy resulting in several LEFT JOINS stepping up the ParentGUID chain. 9.) 现在的最后一步是继续向上传递父级层次结构,从而导致几个 LEFT JOINS 加强 ParentGUID 链。 This query gives a visual representation of hierarchy using these LEFT JOINS.此查询使用这些 LEFT JOINS 提供层次结构的可视化表示。
SELECT
CASE WHEN postParent9.Name IS NULL THEN '' ELSE postParent9.Name + ' > ' END +
CASE WHEN postParent8.Name IS NULL THEN '' ELSE postParent8.Name + ' > ' END +
CASE WHEN postParent7.Name IS NULL THEN '' ELSE postParent7.Name + ' > ' END +
CASE WHEN postParent6.Name IS NULL THEN '' ELSE postParent6.Name + ' > ' END +
CASE WHEN postParent5.Name IS NULL THEN '' ELSE postParent5.Name + ' > ' END +
CASE WHEN postParent4.Name IS NULL THEN '' ELSE postParent4.Name + ' > ' END +
CASE WHEN postParent3.Name IS NULL THEN '' ELSE postParent3.Name + ' > ' END +
CASE WHEN postParent2.Name IS NULL THEN '' ELSE postParent2.Name + ' > ' END +
CASE WHEN postParent1.Name IS NULL THEN '' ELSE postParent1.Name + ' > ' END +
page.Name as [Path]
,page.Name + '.htm' as [PageName]
,post.DisplayName as [PageTitle]
,CASE page.[Type]
WHEN 1 THEN 'Server'
WHEN 4 THEN 'Channel'
WHEN 16 THEN 'Post/Page'
WHEN 64 THEN 'Resource Gallery'
WHEN 256 THEN 'Resource Gallery Item (images/documents)'
WHEN 16384 THEN 'Template Gallery'
WHEN 65536 THEN 'Template' END as [Type]
,page.CreatedWhen as 'Created'
,page.ModifiedWhen as 'Modified'
,html.PropValue as 'HTML'
FROM dbo.Node page
INNER JOIN dbo.Node folders ON folders.NodeGUID = page.ParentGUID AND folders.Name = 'Folders'
INNER JOIN dbo.NodePlaceholderContent html ON html.NodeId = page.Id AND html.PropName = 'HTML'
INNER JOIN dbo.Node post ON post.FollowGUID = page.NodeGUID AND post.IsShortcut = 1
LEFT JOIN dbo.Node postParent1 ON postParent1.NodeGuid = post.ParentGUID
LEFT JOIN dbo.Node postParent2 ON postParent2.NodeGuid = postParent1.ParentGUID
LEFT JOIN dbo.Node postParent3 ON postParent3.NodeGuid = postParent2.ParentGUID
LEFT JOIN dbo.Node postParent4 ON postParent4.NodeGuid = postParent3.ParentGUID
LEFT JOIN dbo.Node postParent5 ON postParent5.NodeGuid = postParent4.ParentGUID
LEFT JOIN dbo.Node postParent6 ON postParent6.NodeGuid = postParent5.ParentGUID
LEFT JOIN dbo.Node postParent7 ON postParent7.NodeGuid = postParent6.ParentGUID
LEFT JOIN dbo.Node postParent8 ON postParent8.NodeGuid = postParent7.ParentGUID
LEFT JOIN dbo.Node postParent9 ON postParent9.NodeGuid = postParent8.ParentGUID
As an aside, my task didn't involve exporting the resource gallery content (images/docs/etc) but there should be enough information here to get a good start on that if you do require those pieces as well.顺便说一句,我的任务不涉及导出资源库内容(图像/文档/等),但如果您确实需要这些内容,这里应该有足够的信息来获得一个良好的开端。
I hope this can be of some help to someone else migrating from MCMS 2002...我希望这对从 MCMS 2002 迁移的其他人有所帮助...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.