简体   繁体   English

如何从 Microsoft 内容管理服务器 (MCMS) 数据库中提取数据

[英]How to extract data from Microsoft Content Management Server (MCMS) database

I need to extract a sizeable amount of data (> 1000 pages) from a Microsoft Content Management Server (MCMS) database for use in a Sitecore website.我需要从 Microsoft 内容管理服务器 (MCMS) 数据库中提取大量数据(> 1000 页)以用于 Sitecore 网站。

I can see two main options:我可以看到两个主要选项:

  1. Migrate the data into a new simplified database and display that information in the new website.将数据迁移到新的简化数据库中,并在新网站中显示该信息。

  2. Convert the MCMS solution to SharePoint and use the SharePoint connector module available for Sitecore to display this information.将 MCMS 解决方案转换为 SharePoint 并使用可供 Sitecore 使用的 SharePoint 连接器模块来显示此信息。

I would prefer to go down the first route as there are no plans to use SharePoint to manage data/content in the future and would prefer to store this information in a simple SQL Server database to allow better searching.我更愿意在第一条路线上使用 go,因为没有计划在未来使用 SharePoint 来管理数据/内容,并且更愿意将此信息存储在简单的 Z9778840A0100CB30C982876741B0 数据库中以允许更好地搜索服务器。

I've looked at the database in question and think that the main tables I'd be interested in are Node , NodePlaceholder and NodePlaceholderContent but am struggling to find what I would expect.我查看了有问题的数据库,并认为我感兴趣的主要表是NodeNodePlaceholderNodePlaceholderContent ,但我正在努力寻找我所期望的。 Can anyone out there give a bit of an explanation about the schema of this database for me?有人可以为我解释一下这个数据库的架构吗? Or am I going to have problems trying to migrate the data in this way?或者我会在尝试以这种方式迁移数据时遇到问题吗?

I've just recently been going through a similar process of exporting content pages out of MCMS 2002 (migrating to Wordpress).我最近刚刚经历了从 MCMS 2002 中导出内容页面的类似过程(迁移到 Wordpress)。

I'm not saying this is the 100% correct way to get the data but it worked for me.我并不是说这是获取数据的 100% 正确方法,但它对我有用。

Here's the process I've taken to get page content out of the database.这是我从数据库中获取页面内容的过程。

As you've already seen the tables storing most of the data are Node and NodePlaceholderContent正如您已经看到的,存储大部分数据的表是NodeNodePlaceholderContent

1.) To get an idea of what the Node table holds you can view the contents organized by type 1.) 要了解Node表包含的内容,您可以查看按类型组织的内容

SELECT
    [Type]
    ,CASE [Type] 
        WHEN      1 THEN 'Server'
        WHEN      4 THEN 'Channel'
        WHEN     16 THEN 'Post/Page'
        WHEN     64 THEN 'Resource Gallery'
        WHEN    256 THEN 'Resource Gallery Item (images/documents)'
        WHEN  16384 THEN 'Template Gallery'
        WHEN  65536 THEN 'Template' END as [Description]
    ,COUNT([Type]) as [Count]
FROM        dbo.Node
GROUP BY    [Type]
ORDER BY    [Count] DESC

2.) Pages (and Posts, will cover Posts further down) are type = 16...but to get just pages (and not posts) we need to filter by IsShortcut = 0 2.) 页面(和帖子,将进一步覆盖帖子)的类型 = 16...但是要仅获取页面(而不是帖子),我们需要按IsShortcut = 0进行过滤

SELECT * FROM dbo.Node WHERE [Type] = 16 AND IsShortcut = 0

3.) I only wanted published pages, so filter by ApprovalStatus = 1 3.) 我只想要已发布的页面,所以按ApprovalStatus = 1过滤

-- Get all published pages
SELECT * 
FROM dbo.Node WHERE [Type] = 16 
AND IsShortcut = 0
AND ApprovalStatus = 1 

4.) Next, determine page created/modified by (with usernames) 4.) 接下来,确定由(使用用户名)创建/修改的页面

-- Get published pages & author/editor
SELECT 
    [page].Id
    ,[page].NodeGuid
    ,[page].Name
    ,[created].Username as 'CreatedBy'
    ,[page].CreatedWhen
    ,[modified].Username as 'ModifiedBy'
    ,[page].ModifiedWhen
FROM        dbo.Node [page]
-- add JOIN on created by user
INNER JOIN  dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
-- add JOIN on modified by user
INNER JOIN  dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
WHERE [Type] = 16 
AND IsShortcut = 0
AND ApprovalStatus = 1 

5.) Next, figure out where in the hierarchy we are by using the Node.ParentGUID column 5.) 接下来,使用Node.ParentGUID列找出我们在层次结构中的位置

SELECT 
    [page].Id
    ,[page].NodeGuid
    ,[page].Name
    ,[pageParent].Name -- add page parent Name
    ,[created].Username as 'CreatedBy'
    ,[page].CreatedWhen
    ,[modified].Username as 'ModifiedBy'
    ,[page].ModifiedWhen
FROM        dbo.Node [page]
INNER JOIN  dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN  dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
-- add JOIN on Node using ParentGUID
INNER JOIN  dbo.Node [pageParent] ON [pageParent].NodeGUID = [page].ParentGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1 

This query let me know that pages are either in parent nodes named Folders or Archive Folder此查询让我知道页面位于名为“ Folders ”或“ Archive Folder ”的父节点中

6.) Go up another level (get parent of parent) 6.) Go 再上一层(获取父级的父级)

SELECT 
    [page].Id
    ,[page].NodeGuid
    ,[page].Name
    ,[pageParent].Name 
    ,[pageParent2].Name -- add parent of parent name
    ,[created].Username as 'CreatedBy'
    ,[page].CreatedWhen
    ,[modified].Username as 'ModifiedBy'
    ,[page].ModifiedWhen
FROM        dbo.Node [page]
INNER JOIN  dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN  dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
INNER JOIN  dbo.Node [pageParent] ON [pageParent].NodeGUID = [page].ParentGUID
-- add another JOIN on Node using ParentGUID (parent of parent)
INNER JOIN  dbo.Node [pageParent2] ON [pageParent2].NodeGUID = [pageParent].ParentGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1 

The parent of parent is Server (the root level) so now my conclusion is if the page's parent is:父级的父级是Server (根级别)所以现在我的结论是如果页面的父级是:

  • Folders - then that's an active page Folders - 那么这是一个活动页面
  • Archive Folder - then that's a previous revision of another page Archive Folder - 那是另一个页面的先前版本

I only want active pages so I'm going to JOIN on the Folders parent only我只想要活动页面,所以我只在Folders父级上加入

7.) Now how about the markup. 7.) 现在如何标记。 In our MCMS template there was only had one placeholder area.在我们的 MCMS 模板中,只有一个占位符区域。 The NodePlaceholder table will identify the name of the placeholder(s) which is helpful if you have multiple placeholder areas in your template. NodePlaceholder表将识别占位符的名称,如果您的模板中有多个占位符区域,这将很有帮助。 I'm only going to join on NodePlaceholdercontent for simplicity.为简单起见,我只会加入NodePlaceholdercontent

SELECT 
    [page].Id
    ,[page].NodeGuid
    ,[page].Name
    /* remove parent names */
    ,[created].Username as 'CreatedBy'
    ,[page].CreatedWhen
    ,[modified].Username as 'ModifiedBy'
    ,[page].ModifiedWhen
    ,html.PropValue as 'HTML' -- add the markup
FROM        dbo.Node [page]
INNER JOIN  dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN  dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
-- change alias to "folders"
INNER JOIN  dbo.Node [folders] ON [folders].NodeGUID = [page].ParentGUID AND [folders].Name = 'Folders'
-- join on PlaceholderContent to get the HTML
-- this table will also have references to any static files contained in the page (such as images) so we filter those out by PropName = 'HTML'
INNER JOIN  dbo.NodePlaceholderContent html ON html.NodeId = [page].Id AND html.PropName = 'HTML' 
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1 

8.) So at this point I got a little stuck on trying to determine where the page is in the system (ie. relative path or what channel does it live in), going back to step 1 & 2, type = 16 can be either a post or a page (which aren't the same thing but they are related). 8.)所以在这一点上,我有点卡在试图确定页面在系统中的位置(即相对路径或它所在的频道),回到第 1 步和第 2 步,type = 16 可以是帖子或页面(它们不是同一件事,但它们是相关的)。 So now we JOIN our pages to the post records to determine pathing.所以现在我们将我们的页面加入到帖子记录中以确定路径。

After some google searches I stumbled upon this excerpt from Microsoft Content Management Server 2002: a complete guide really helped to get the rest of the way (and identified the Node.Type enums)经过一些谷歌搜索后,我偶然发现了Microsoft Content Management Server 2002 的这段摘录:一个完整的指南确实有助于获得 rest 的方式(并确定了Node.Type枚举)

SELECT 
    [page].Id
    ,[page].NodeGuid
    ,[page].Name
    ,[post].DisplayName as 'Title' -- add page Title from the post record
    ,[pageParent].Name 
    ,[pageParent2].Name
    ,[created].Username as 'CreatedBy'
    ,[page].CreatedWhen
    ,[modified].Username as 'ModifiedBy'
    ,[page].ModifiedWhen
    ,html.PropValue as 'HTML'
FROM        dbo.Node [page]
INNER JOIN  dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN  dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
INNER JOIN  dbo.Node [folders] ON [folders].NodeGUID = [page].ParentGUID AND [folders].Name = 'Folders'
INNER JOIN  dbo.NodePlaceholderContent html ON html.NodeId = [page].Id AND html.PropName = 'HTML' 
-- join using followGUID to get the posting
INNER JOIN  dbo.Node [post] ON [post].FollowGUID = [page].NodeGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1 

9.) The final step now is to keep going up the post parent hierarchy resulting in several LEFT JOINS stepping up the ParentGUID chain. 9.) 现在的最后一步是继续向上传递父级层次结构,从而导致几个 LEFT JOINS 加强 ParentGUID 链。 This query gives a visual representation of hierarchy using these LEFT JOINS.此查询使用这些 LEFT JOINS 提供层次结构的可视化表示。

SELECT 
    CASE WHEN postParent9.Name IS NULL THEN '' ELSE postParent9.Name + ' > ' END +
    CASE WHEN postParent8.Name IS NULL THEN '' ELSE postParent8.Name + ' > ' END +
    CASE WHEN postParent7.Name IS NULL THEN '' ELSE postParent7.Name + ' > ' END +
    CASE WHEN postParent6.Name IS NULL THEN '' ELSE postParent6.Name + ' > ' END +
    CASE WHEN postParent5.Name IS NULL THEN '' ELSE postParent5.Name + ' > ' END +
    CASE WHEN postParent4.Name IS NULL THEN '' ELSE postParent4.Name + ' > ' END +
    CASE WHEN postParent3.Name IS NULL THEN '' ELSE postParent3.Name + ' > ' END +
    CASE WHEN postParent2.Name IS NULL THEN '' ELSE postParent2.Name + ' > ' END +
    CASE WHEN postParent1.Name IS NULL THEN '' ELSE postParent1.Name + ' > ' END +
    page.Name as [Path]
    ,page.Name + '.htm' as [PageName]
    ,post.DisplayName as [PageTitle]
    ,CASE page.[Type] 
        WHEN      1 THEN 'Server'
        WHEN      4 THEN 'Channel'
        WHEN     16 THEN 'Post/Page'
        WHEN     64 THEN 'Resource Gallery'
        WHEN    256 THEN 'Resource Gallery Item (images/documents)'
        WHEN  16384 THEN 'Template Gallery'
        WHEN  65536 THEN 'Template' END as [Type]
    ,page.CreatedWhen as 'Created'
    ,page.ModifiedWhen as 'Modified'
    ,html.PropValue as 'HTML'
FROM        dbo.Node page
INNER JOIN  dbo.Node folders ON folders.NodeGUID = page.ParentGUID AND folders.Name = 'Folders'
INNER JOIN  dbo.NodePlaceholderContent html ON html.NodeId = page.Id AND html.PropName = 'HTML'
INNER JOIN  dbo.Node post ON post.FollowGUID = page.NodeGUID AND post.IsShortcut = 1
LEFT JOIN   dbo.Node postParent1 ON postParent1.NodeGuid = post.ParentGUID
LEFT JOIN   dbo.Node postParent2 ON postParent2.NodeGuid = postParent1.ParentGUID
LEFT JOIN   dbo.Node postParent3 ON postParent3.NodeGuid = postParent2.ParentGUID
LEFT JOIN   dbo.Node postParent4 ON postParent4.NodeGuid = postParent3.ParentGUID
LEFT JOIN   dbo.Node postParent5 ON postParent5.NodeGuid = postParent4.ParentGUID
LEFT JOIN   dbo.Node postParent6 ON postParent6.NodeGuid = postParent5.ParentGUID
LEFT JOIN   dbo.Node postParent7 ON postParent7.NodeGuid = postParent6.ParentGUID
LEFT JOIN   dbo.Node postParent8 ON postParent8.NodeGuid = postParent7.ParentGUID
LEFT JOIN   dbo.Node postParent9 ON postParent9.NodeGuid = postParent8.ParentGUID

As an aside, my task didn't involve exporting the resource gallery content (images/docs/etc) but there should be enough information here to get a good start on that if you do require those pieces as well.顺便说一句,我的任务不涉及导出资源库内容(图像/文档/等),但如果您确实需要这些内容,这里应该有足够的信息来获得一个良好的开端。

I hope this can be of some help to someone else migrating from MCMS 2002...我希望这对从 MCMS 2002 迁移的其他人有所帮助...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从SQL Server数据库中提取数据 - Extract data from SQL Server database 从Excel文件中提取数据并存储在SQL Server数据库中 - Extract Data from Excel File and Store in SQL Server database 如何在SQL Server中下载存储在数据库中的Word文档作为内容数据? - How to download word document stored in database as content data in sql server? 数据库已从Microsoft SQL Server中删除,为什么? - Database is removed from Microsoft SQL Server, Why? 我如何从数据库中删除所有数据并使用 Microsoft.SqlServer.Management.Smo 插入新数据; - How i can delete all data from DB and insert new using Microsoft.SqlServer.Management.Smo; 从Microsoft SQL Server数据库向JavaScript函数发送数据的最佳方法 - The best way to send data to JavaScript functions from a Microsoft SQL Server Database 将html内容添加到Microsoft SQL Server数据库ASP网络 - Add html content to Microsoft SQL Server database asp net 如何从数据库中提取信息? - How to extract information from database? Microsoft Report(RDLC)阻止将数据保存到SQL Server数据库 - Microsoft report (rdlc) blocks saving data to SQL Server database 如何使用 asp:Repeater 标签然后使用数据绑定表达式而不声明 Microsoft SQL 服务器数据库? - How to use asp:Repeater tag and then use data-binding expressions without declare Microsoft SQL Server database?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM