简体   繁体   English

Fitbit数据导出-创建数据仓库

[英]Fitbit Data Export - Creating a data warehouse

I plan to create a Fitbit data warehouse for educational purposes, and there doesn't seem to be any material online for Fitbit data specifically. 我打算为教育目的创建一个Fitbit数据仓库,而且似乎没有专门针对Fitbit数据的在线材料。

A few issues faced: 面临的一些问题:

  1. You can only export 1 month of data (max) at a time from the Fitbit website. 您一次只能从Fitbit网站导出最多1个月的数据。 My plan would be to drop a month's worth of data at a time into a folder, and have these files read seperately. 我的计划是一次将一个月的数据放入一个文件夹中,并分别读取这些文件。

  2. You can either export the data through CSV or .XLS. 您可以通过CSV或.XLS导出数据。 The issue with XLS is that each day in the month will create a seperate sheet for food logs, which will then need to be merged in a staging table. XLS的问题在于,该月的每一天都会为食物日志创建单独的工作表,然后需要将其合并到暂存表中。 The issue with CSV would be that there is one sheet per file, with all of the data in there: CSV Layout CSV的问题是每个文件只有一张纸,而所有数据都在其中: CSV布局

I would then use SSIS to load the data into a SQL Server database for reporting purposes. 然后,我将使用SSIS将数据加载到SQL Server数据库中以进行报告。

Which would the more suited approach be, to export the data using .XLS format or CSV? 使用.XLS格式或CSV导出数据哪种方法更合适?

Edit: How would it be possible to load a CSV file into SSIS with such a format? 编辑:如何将这种格式的CSV文件加载到SSIS中?

The CSV layout would be as such: CSV布局将如下所示:

Body,,,,,,,,, Date,Weight,BMI,Fat,,,,,, 01/06/2018,71.5,23.29,15,,,,,, 02/06/2018,71.5,23.29,15,,,,,, 03/06/2018,71.5,23.29,15,,,,,, 04/06/2018,71.5,23.29,15,,,,,, 05/06/2018,71.5,23.29,15,,,,,, 06/06/2018,71.5,23.29,15,,,,,, 07/06/2018,71.5,23.29,15,,,,,, 08/06/2018,71.5,23.29,15,,,,,, 09/06/2018,71.5,23.29,15,,,,,, 10/06/2018,71.5,23.29,15,,,,,, 11/06/2018,71.5,23.29,15,,,,,, 12/06/2018,71.5,23.29,15,,,,,, 13/06/2018,71.5,23.29,15,,,,,, 14/06/2018,71.5,23.29,15,,,,,, 15/06/2018,71.5,23.29,15,,,,,, 16/06/2018,71.5,23.29,15,,,,,, 17/06/2018,71.5,23.29,15,,,,,, 18/06/2018,71.5,23.29,15,,,,,, 19/06/2018,71.5,23.29,15,,,,,, 20/06/2018,71.5,23.29,15,,,,,, 21/06/2018,71.5,23.29,15,,,,,, 22/06/2018,71.5,23.29,15,,,,,, 23/06/2018,71.5,23.29,15,,,,,, 24/06/2018,71.5,23.29,15,,,,,, 25/06/2018,71.5,23.29,15,,,,,, 26/06/2018,71.5,23.29,15,,,,,, 27/06/2018,71.5,23.29,15,,,,,, 28/06/2018,71.5,23.29,15,,,,,, 29/06/2018,72.8,23.72,15,,,,,, 30/06/2018,72.95,23.77,15,,,,,, ,,,,,,,,, 身体,日期,体重,BMI,脂肪,01/06 / 2018,71.5,23.29,15 ,,,,, 02/06 / 2018,71.5,23.29, 15 ,,,,, 03/06 / 2018,71.5,23.29,15 ,,,,, 04/06 / 2018,71.5,23.29,15 ,,,,, 05/06 / 2018,71.5,23.29 ,15 ,,,,, 06/06 / 2018,71.5,23.29,15 ,,,,,, 07/06 / 2018,71.5,23.29,15 ,,,,,, 08/06 / 2018,71.5, 23.29,15 ,,,,,, 09/06 / 2018,71.5,23.29,15 ,,,,,, 10/06 / 2018,71.5,23.29,15 ,,,,,, 11/06 / 2018,71.5 ,23.29,15 ,,,,, 12/06 / 2018,71.5,23.29,15 ,,,,, 13/06 / 2018,71.5,23.29,15 ,,,,,, 14/06/2018, 71.5,23.29,15 ,,,,, 15/06 / 2018,71.5,23.29,15 ,,,,, 16/06 / 2018,71.5,23.29,15 ,,,,,, 17/06/2018 ,71.5,23.29,15 ,,,,, 18/06 / 2018,71.5,23.29,15 ,,,,, 19/06 / 2018,71.5,23.29,15 ,,,,,, 20/06 / 2018,71.5,23.29,15 ,,,,, 21/06 / 2018,71.5,23.29,15 ,,,,, 22/06 / 2018,71.5,23.29,15 ,,,,,, 23/06 /2018,71.5,23.29,15 ,,,,, 24/06 / 2018,71.5,23.29,15 ,,,,, 25/06 / 2018,71.5,23.29,15 ,,,,, 26 / 06 / 2018,71.5,23.29,15 ,,,, 27/06 / 2018,71.5,23.29,15 ,,,,, 28/06 / 2018,71.5,23.29,15 ,,,,, 29 /06/2018,72.8,23.72,15 ,,,,, 30/06 / 2018,72.95,23.77,15 ,,,,,,,,,,,,,,,

Foods,,,,,,,,, Date,Calories In,,,,,,,, 01/06/2018,0,,,,,,,, 02/06/2018,0,,,,,,,, 03/06/2018,0,,,,,,,, 04/06/2018,0,,,,,,,, 05/06/2018,0,,,,,,,, 06/06/2018,0,,,,,,,, 07/06/2018,0,,,,,,,, 08/06/2018,0,,,,,,,, 09/06/2018,0,,,,,,,, 10/06/2018,0,,,,,,,, 11/06/2018,0,,,,,,,, 12/06/2018,0,,,,,,,, 13/06/2018,100,,,,,,,, 14/06/2018,0,,,,,,,, 15/06/2018,0,,,,,,,, 16/06/2018,0,,,,,,,, 17/06/2018,0,,,,,,,, 18/06/2018,0,,,,,,,, 19/06/2018,0,,,,,,,, 20/06/2018,0,,,,,,,, 21/06/2018,0,,,,,,,, 22/06/2018,0,,,,,,,, 23/06/2018,0,,,,,,,, 24/06/2018,0,,,,,,,, 25/06/2018,0,,,,,,,, 26/06/2018,0,,,,,,,, 27/06/2018,"1,644",,,,,,,, 28/06/2018,"2,390",,,,,,,, 29/06/2018,981,,,,,,,, 30/06/2018,0,,,,,,,, 食物,日期,卡路里,2018年1月6日的卡路里,0、2018年2月6日的卡路里,0 、、、、、 ,, 03/06 / 2018,0 ,,,,,, 04/06 / 2018,0 ,,,,,,, 05/06 / 2018,0 ,,,,,,,, 06/06 / 2018,0 ,,,,,,, 07/06 / 2018,0 ,,,,,,, 08/06 / 2018,0 ,,,,,,, 09/06 / 2018,0, ,,,,,, 10/06 / 2018,0 ,,,,,,, 11/06 / 2018,0 ,,,,,,, 12/06 / 2018,0 ,,,,,,, ,, 13/06 / 2018,100 ,,,,,, 14/06 / 2018,0 ,,,,,,, 15/06 / 2018,0 ,,,,,,, 16/06 / 2018,0 ,,,,,,, 17/06 / 2018,0 ,,,,,,, 18/06 / 2018,0 ,,,,,,, 19/06 / 2018,0, ,,,,,, 20/06 / 2018,0 ,,,,,,, 21/06 / 2018,0 ,,,,,,, 22/06 / 2018,0 ,,,,,,, ,, 23/06 / 2018,0 ,,,,,, 24/06 / 2018,0 ,,,,,,, 25/06 / 2018,0 ,,,,,,, 26/06 / 2018,0 ,,,,,,, 27/06/2018,“ 1,644” ,,,,,, 28/06/2018,“ 2,390” ,,,,,,, 29/06 / 2018,981 ,,,,,,, 30/06 / 2018,0 ,,,,,,,,

For example, "Foods" would be the table name, "Date" and "Calories In" would be column names. 例如,“食物”将是表名,“日期”和“卡路里输入”将是列名。 "01/06/2018" is the Date, "0" is the "Calories in" and so on. “ 01/06/2018”是日期,“ 0”是“卡路里输入”,依此类推。

Tricky, I just pulled my fitbit data as this peaked my curiosity. 棘手的是,我只是拉了我的fitbit数据,因为这激发了我的好奇心。 That csv is messy. 那个csv很乱。 You basically have mixed file formats in one file. 基本上,您在一个文件中具有混合文件格式。 That won't be straight forward in SSIS. 在SSIS中,这不是直截了当的。 The XLS format and like you mentioned the food logs tagging each day on the worksheet, SSIS won't like that changing. XLS格式,就像您在工作表上每天提到的食物日志标签一样,SSIS不会喜欢这种更改。

CSV: CSV: 在此处输入图片说明 XLS: XLS: 在此处输入图片说明

Couple of options off the top of my head that I see for CSV. 我在CSV上看到的几个选项不在乎。

Individual exports from Fitbit Fitbit的个人出口

I see you can pick which data you want to include in your export: Body, Foods, Activities, Sleep. 我看到您可以选择要包含在导出中的数据:身体,食物,活动,睡眠。

  1. Do each export individually, saving each file with a prefix of what type of data it is. 分别进行每个导出,并为每个文件保存其类型为数据的前缀。
  2. Then build SSIS with multiple foreach loops and data flow task for each individual file format. 然后为每个单独的文件格式构建具有多个foreach循环和数据流任务的SSIS。

That would do it, but would be a tedious effort when having to export the data from Fitbit. 可以做到,但是在必须从Fitbit导出数据时将是繁琐的工作。

Handle the one file with all the data 处理所有数据的一个文件

This option you would have to get creative since the formats are mixed and you have sections with difference column definitions, etc. 由于格式混合,并且各节的列定义不同,因此您必须发挥创意。

One option would be to create a staging table with as many columns as which ever section has the most, which looks to be maybe "Activities". 一种选择是创建一个登台表,该登台表的栏目与该节中最多的栏目一样多,看起来可能是“活动”。 Give each column a generic name as Column1,Column2 and make them all VARCHAR. 给每列指定一个通用名称,即Column1,Column2,并将它们全部设为VARCHAR。

Since we have mixed "formats" and not all data types would line up we just need to get all the data out first and then sort out conversion later. 由于我们混合使用“格式”,并且并非所有数据类型都可以对齐,因此我们只需要首先获取所有数据,然后再处理转换即可。

From there you can build one data flow and flat file source and also get line number added since we will need to sort out where each section of data is later. 在这里,您可以构建一个数据流和平面文件源, 还可以添加行号,因为我们需要稍后整理数据的每个部分。

When building out the file connection for your source you will have to manually add all columns since the first row of data in your file doesn't include all the commas for each field, SSIS won't be able to detect all the columns. 在为源建立文件连接时,由于文件中的第一行数据不包含每个字段的所有逗号,因此您将必须手动添加所有列,SSIS将无法检测所有列。 Manually add the number of columns needed, also make sure: 手动添加所需的列数,并确保:

  • Text Qualifier = " 文字限定词=“
  • Header row Delimiter = {LF} 标题行分隔符= {LF}
  • Row Delimiter = {LF} 行定界符= {LF}
  • Column Delimiter = , 列分隔符=,

That should get you data loaded into a database at least into a stage table. 那应该使您至少将数据加载到数据库中,并将其加载到阶段表中。 From there you would need to use a bunch of T-SQL to zero in on each "section" of data and then parse, convert and load from there. 从那里开始,您将需要使用一堆T-SQL将数据的每个“部分”清零,然后从那里进行解析,转换和加载。

Small test I did I just had table call TestTable: 我做了小测试,我刚刚有表调用TestTable:

CREATE TABLE [dbo].[TestTable](
    [LineNumber] [INT] NULL,
    [Column1] [VARCHAR](MAX) NULL,
    [Column2] [VARCHAR](MAX) NULL,
    [Column3] [VARCHAR](MAX) NULL,
    [Column4] [VARCHAR](MAX) NULL,
    [Column5] [VARCHAR](MAX) NULL,
    [Column6] [VARCHAR](MAX) NULL,
    [Column7] [VARCHAR](MAX) NULL,
    [Column8] [VARCHAR](MAX) NULL,
    [Column9] [VARCHAR](MAX) NULL
)

Dataflow and hooked up the file source: 数据流并连接文件源:

在此处输入图片说明

在此处输入图片说明

Execute dataflow and then I had data loaded as: 执行数据流,然后将数据加载为:

在此处输入图片说明

From there I worked out some T-SQL to get to each "Section" of data. 从那里,我制定了一些T-SQL来访问数据的每个“部分”。 Here's an example that shows how you could filter to the "Foods" section: 这是一个示例,显示了如何过滤到“食物”部分:

DECLARE @MaxLine INT = (
                           SELECT MAX([LineNumber])
                           FROM   [TestTable]
                       );

--Something like this, using a sub query that gets you starting and ending line numbers for each section.
--Doing the conversion of what column that section of data ended up in.
SELECT     CONVERT(DATE, [a].[Column1]) AS [Date]
         , CONVERT(BIGINT, [a].[Column2]) AS [CaloriesIn]
FROM       [TestTable] [a]
INNER JOIN (
               --Something like this to build out starting and ending line number for each section
               SELECT [Column1]
                    , [LineNumber] + 2 AS [StartLineNumber] --We add 2 here as the line that start the data in a section is 2 after its "heading"
                    , LEAD([LineNumber], 1, @MaxLine) OVER ( ORDER BY [LineNumber] )
                      - 1 AS [EndLineNumber]
               FROM   [TestTable]
               WHERE  [Column1] IN ( 'Body', 'Foods', 'Activities' ) --Each of the sections of data
           ) AS [Section]
    ON [a].[LineNumber]
       BETWEEN [Section].[StartLineNumber] AND [Section].[EndLineNumber]
WHERE      [Section].[Column1] = 'Foods'; --Then just filter on what sectoin you want.

Which in turn gave me the following: 这又给了我以下几点:

在此处输入图片说明

There could be other options for parsing that data, but this should give a good starting point and a idea on how tricky this particular CSV file is. 解析该数据可能还有其他选项,但这应该为您提供一个很好的起点,并提供一个关于此特定CSV文件的棘手性的想法。

As for the XLS option, that would be straight forward for all sections except food logs. 至于XLS选项,除了食物原木之外的所有部分都可以直接使用。 You would basically setup an excel file connection and each sheet would be a "table" in the source in the data flow and have individual data flows for each worksheet. 您基本上将建立一个excel文件连接,并且每个工作表在数据流中都是源中的“表”,并且每个工作表都有单独的数据流。

在此处输入图片说明

But then what about Food logs. 但是食物日志呢? Once those changed and you rolled into the next month or something SSIS would freak out, error, probably complain about metadata. 一旦这些改变了,您进入下个月,否则SSIS就会出现异常,错误,可能是抱怨元数据。

One obvious work around would be manually manipulate the excel and merge all of them into one "Food Log" sheet prior to running it through SSIS. 一种明显的解决方法是手动操作excel,然后将其全部合并到一个“ Food Log”表中,然后再通过SSIS运行它。 Not ideal because you'd probably want something completely automated. 不理想,因为您可能想要完全自动化的东西。

I'd have to tinker around with that. 我必须对此进行修补。 Maybe a script task and some C# code to combine all those sheets into one, parsing the date out of each sheet name and appending it to the data prior to a data flow loading it. 也许是脚本任务和一些C#代码将所有这些工作表合并为一个,从每个工作表名称中解析日期,然后在数据流加载之前将日期附加到数据中。 Maybe possible. 也许有可能。

Looks like there are challenges with both of the files Fitbit is exporting out no matter which format you look at. 无论您使用哪种格式,Fitbit导出的两个文件似乎都面临挑战。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM