簡體   English   中英

R:從 JSON/XML (clinicaltrials.gov) 到 data.frame (tidy) 的嵌套列表

[英]R: nested list from JSON/XML (clinicaltrials.gov) to data.frame (tidy)

目的

對於大學研究,我嘗試處理此處公開的臨床研究數據。

For reproducibility, I would like to directly use the downloaded JSON or XML files (and not to retrieve the data via the web API , which has been described: how-to-get-data-out-of-nested-xml-structure ) .

更新 1: JSON 文件的結構在此處發布

更新 2: XML 文件的結構在此處發布

更新 3:

我認為tidyjson::read_json and tidyjson::spread_all可以解決問題。 請參閱答案部分。

我需要的

對於我的工作流程,我需要將數據轉換為 data.frames(整潔的 data.frames 會更好)。 我更喜歡 JSON,但是,如果有 XML 格式的解決方案,我會非常高興。

測試數據

我使用jsonlite::fromJSON("NCT0455805.json")下載的 JSON 文件之一生成的嵌套列表

test <- list(FullStudy = list(Rank = 254369L, Study = list(ProtocolSection = list(
    IdentificationModule = list(NCTId = "NCT01455805", OrgStudyIdInfo = list(
        OrgStudyId = "SS2011UK"), Organization = list(OrgFullName = "Spinal Simplicity LLC", 
        OrgClass = "INDUSTRY"), BriefTitle = "Minuteman Spinal Fusion Implant Versus Surgical Decompression for Lumbar Spinal Stenosis", 
        OfficialTitle = "Efficacy and Quality of Life Following Treatment of Lumbar Spinal Stenosis, Spondylolisthesis or Degenerative Disc Disease With the Minuteman Interspinous Interlaminar Fusion Implant Versus Surgical Decompression"), 
    StatusModule = list(StatusVerifiedDate = "October 2020", 
        OverallStatus = "Active, not recruiting", ExpandedAccessInfo = list(
            HasExpandedAccess = "No"), StartDateStruct = list(
            StartDate = "June 2012"), PrimaryCompletionDateStruct = list(
            PrimaryCompletionDate = "March 2024", PrimaryCompletionDateType = "Anticipated"), 
        CompletionDateStruct = list(CompletionDate = "March 2024", 
            CompletionDateType = "Anticipated"), StudyFirstSubmitDate = "October 13, 2011", 
        StudyFirstSubmitQCDate = "October 18, 2011", StudyFirstPostDateStruct = list(
            StudyFirstPostDate = "October 20, 2011", StudyFirstPostDateType = "Estimate"), 
        LastUpdateSubmitDate = "October 22, 2020", LastUpdatePostDateStruct = list(
            LastUpdatePostDate = "October 26, 2020", LastUpdatePostDateType = "Actual")), 
    SponsorCollaboratorsModule = list(ResponsibleParty = list(
        ResponsiblePartyType = "Sponsor"), LeadSponsor = list(
        LeadSponsorName = "Spinal Simplicity LLC", LeadSponsorClass = "INDUSTRY"), 
        CollaboratorList = list(Collaborator = list(list(CollaboratorName = "The Leeds Teaching Hospitals NHS Trust", 
            CollaboratorClass = "OTHER")))), OversightModule = list(
        OversightHasDMC = "Yes"), DescriptionModule = list(BriefSummary = "Lumbar spinal stenosis (LSS), is a common disorder of narrowing of the spinal canal in the lower part of the back. This causes discomfort in the legs when standing or walking because of pressure on the spinal nerves.There are several treatment options for LSS including physiotherapy, lumbar surgical decompression procedures such as laminectomy, Foraminotomy, Discectomy and more recently devices for interspinous distraction such as the XSTOP® and from May 2011 Minuteman\231.\n\nSurgical decompression for LSS involves the removal of excess bone, ligament, and soft-tissue allowing more room for the nerves. The operation is usually preformed under general anaesthetic and with an average stay in hospital for 2-3 nights. Whereas the Minuteman\231 implant is preformed as a day case under local or general anaesthetic and involves implanting the device into the space between two back bones to relieve pressure on the nerves and, therefore, pain in the legs.\n\nThis is a multi centred (four sites) randomised controlled trial with a total sample of 50 participants after obtaining their informed consent. Participants will attend the pain clinic at the Hospitals for a baseline visit where they will be randomised with a ratio of 1:1 to receive either the Minuteman\231 Interspinous interlaminar fusion Implant or standard surgical decompression for the treatment of lumbar spinal stenosis (LSS). Following randomisation arrangements will be made for the participant to receive the randomised treatment. If allocated to Minuteman\231 Implant, the treatment will be conducted by the Pain Specialist identified at the site. If allocated to surgical decompression, the treatment will be conducted by the neuro/spinal-surgeon identified at the site. Participates will be followed up regularly for 60 months post implant to assess clinical efficacy, safety, participants function and quality of life of each treatment.", 
        DetailedDescription = "This is a prospective randomised study monitoring patients for up to 5 years post treatment. Only patients who have an appropriately diagnosed Lumbar Spinal Stenosis with intermittent claudication with/without low back pain, with no adequate symptomatic relief after at least 6 months of conservative treatment will be asked to give consent to be involved. Potential participants will be approached for enrollment 17days before the planned baseline visit. Patients will be given oral and written information about the trial as well as the patient information leaflet for the study. If informed consent is given their participation in this study will be for a maximum of 5 years."), 
    ConditionsModule = list(ConditionList = list(Condition = c("Lumbar Spinal Stenosis", 
    "Spondylolisthesis", "Degenerative Disc Disease"))), DesignModule = list(
        StudyType = "Interventional", PhaseList = list(Phase = "Not Applicable"), 
        DesignInfo = list(DesignAllocation = "Randomized", DesignInterventionModel = "Parallel Assignment", 
            DesignPrimaryPurpose = "Treatment", DesignMaskingInfo = list(
                DesignMasking = "None (Open Label)")), EnrollmentInfo = list(
            EnrollmentCount = "50", EnrollmentType = "Anticipated")), 
    ArmsInterventionsModule = list(ArmGroupList = list(ArmGroup = list(
        list(ArmGroupLabel = "Minuteman Fusion Implant", ArmGroupType = "Active Comparator", 
            ArmGroupDescription = "Minuteman\231 interspinous interlaminar fusion Implant (interspinous interlaminar fusion device) which gained CE Mark approval in May 2011", 
            ArmGroupInterventionList = list(ArmGroupInterventionName = "Device: Minuteman Fusion Implant")), 
        list(ArmGroupLabel = "Surgical decompression", ArmGroupType = "Other", 
            ArmGroupDescription = "Surgical decompression refers to the following operations Laminectomy, Foraminotomy, Discectomy or any other surgical procedure that the clinician feels is relevant for the decompression of lumbar spinal stenosis.", 
            ArmGroupInterventionList = list(ArmGroupInterventionName = "Procedure: surgical decompression")))), 
        InterventionList = list(Intervention = list(list(InterventionType = "Device", 
            InterventionName = "Minuteman Fusion Implant", InterventionDescription = "The Minuteman\231 interspinous interlaminar fusion device consists of a central threaded portion that has a two-part wing plate hinged near its proximal end, with spikes on the extended distal end of the wing plate, and a multi-spiked end cap plate that is located at the distal end of the device and is retained and tightened in place with a locking hex nut. Compression between the spiked wing plate and the spiked end cap plate serves to fix the spinous processes in place and to facilitate fusion, together with bone graft fusion material placed within the device. The threaded external body has been designed to provide ease of distraction and insertion via a minimally invasive surgical procedure.", 
            InterventionArmGroupLabelList = list(InterventionArmGroupLabel = "Minuteman Fusion Implant"), 
            InterventionOtherNameList = list(InterventionOtherName = "The Minuteman\231 interspinous interlaminar fusion device")), 
            list(InterventionType = "Procedure", InterventionName = "surgical decompression", 
                InterventionDescription = "Surgical decompression refers to the following operations Laminectomy, Foraminotomy, Discectomy or any other surgical procedure that the clinician feels is relevant for the decompression of lumbar spinal stenosis", 
                InterventionArmGroupLabelList = list(InterventionArmGroupLabel = "Surgical decompression"))))), 
    OutcomesModule = list(PrimaryOutcomeList = list(PrimaryOutcome = list(
        list(PrimaryOutcomeMeasure = "Change from baseline of clinical efficacy up to 60 months post procedure", 
            PrimaryOutcomeDescription = "These include:\n\nVisual Analogue Scale (VAS) pain scores Leg Pain\nVisual Analogue Scale (VAS) pain scores Back Pain\nOswestry Disability Index (ODI)\nZurich Claudication Questionnaire (ZCQ)\nAssessment of Physical Function via distance walked in 5 minutes and number of repetitions of sitting to standing in 1 minute.\n\nThe main outcome will be a comparison between treatment groups based on the change from baseline at each follow-up visit for each of the measures listed above.", 
            PrimaryOutcomeTimeFrame = "8 weeks and up to 60 months post procedure."))), 
        SecondaryOutcomeList = list(SecondaryOutcome = list(list(
            SecondaryOutcomeMeasure = "measures of quality of life", 
            SecondaryOutcomeDescription = "These include:\n\nChange in functional status questionnaire from baseline\nParticipants global impression of change from baseline (PGIC)\nClinician's global Impression of change from baseline (CGIC)\nEmployment status", 
            SecondaryOutcomeTimeFrame = "8 weeks and up to 60 months post procedure."), 
            list(SecondaryOutcomeMeasure = "Adverse events related to device and procedure", 
                SecondaryOutcomeTimeFrame = "safety to be assessed at 8 weeks and up to 60 months post procedure.")))), 
    EligibilityModule = list(EligibilityCriteria = "Inclusion Criteria:\n\nIs male or a non pregnant female aged 18years or older\nBMI = 35kg/m2\nHas chronic leg pain with or without back pain of greater than 6 months duration,which is partially or completely relieved by either sitting or adopting a flexed posture and who are suitable in the clinicians opinion for posterior lumbar surgery\nPre-operative ODI score = 20%\nPre-operative ZCQ Physical Function Domain =2\nPre-operative VAS Leg pain score = 4\nHas completed at least 6 months of conservative treatment without obtaining adequate symptomatic relief or has worsening neurological symptoms.\nHas degenerative changes at 1 or 2 levels confirmed by MRI or CT Myelogram within the last 12 months) with one or more of the following:\nLumbar spinal stenosis with intermittent neurogenic claudication\nDegeneration of the disc (as evidenced by imaging on MRI)\nAnnular thickening\nDegenerative Spondylolisthesis = Meyerding Grade 1\nThickening of ligamentum flavum\n\nExclusion Criteria:\n\nFixed motor deficit\nHas undergone previous lumbar spinal surgery\nIs unwilling or unable to give consent or adhere to the follow up schedule\nHas active infection or metastatic disease\nHas spondylolisthesis > grade 1\nHas neurogenic bladder or bowel disease\nHas a history of Osteopenia and or Osteoporosis. Evaluation of possible Osteopenia and or Osteoporosis will be conducted via a bone density scan prior to randomisation if ANY of the Bone Mass Evaluation criteria is met\nPatients who are not deemed fit for anaesthesia/major surgery due to underlying medical condition", 
        HealthyVolunteers = "No", Gender = "All", MinimumAge = "18 Years", 
        StdAgeList = list(StdAge = c("Adult", "Older Adult"))), 
    ContactsLocationsModule = list(OverallOfficialList = list(
        OverallOfficial = list(list(OverallOfficialName = "Ganesan Baranidharan, Dr", 
            OverallOfficialAffiliation = "Leeds Teaching Hospitals NHS Trust", 
            OverallOfficialRole = "Principal Investigator"))), 
        LocationList = list(Location = list(list(LocationFacility = "Taunton & Somerset NHS Foundation Trust of Musgrove Park Hospital", 
            LocationCity = "Taunton", LocationState = "Somerset", 
            LocationZip = "TA1 5DA", LocationCountry = "United Kingdom"), 
            list(LocationFacility = "The Ipswich Hospital NHS Trust", 
                LocationCity = "Ipswich", LocationState = "Suffolk", 
                LocationZip = "IP4 5PD", LocationCountry = "United Kingdom"), 
            list(LocationFacility = "Pain and Interventional Neuromodulation Research Group, Pain Management Dept, Seacroft Hospital, Leeds Teaching Hospitals NHS Trust", 
                LocationCity = "Leeds", LocationState = "West Yorkshire", 
                LocationZip = "LS14 6UH", LocationCountry = "United Kingdom"), 
            list(LocationFacility = "The Dudley Group NHS Foundation Trust, Russell Hall Hospital", 
                LocationCity = "Birmingham", LocationZip = "DY1 2HQ", 
                LocationCountry = "United Kingdom"))))), DerivedSection = list(
    MiscInfoModule = list(VersionHolder = "February 26, 2021"), 
    ConditionBrowseModule = list(ConditionMeshList = list(ConditionMesh = list(
        list(ConditionMeshId = "D000013130", ConditionMeshTerm = "Spinal Stenosis"), 
        list(ConditionMeshId = "D000055959", ConditionMeshTerm = "Intervertebral Disc Degeneration"), 
        list(ConditionMeshId = "D000013168", ConditionMeshTerm = "Spondylolisthesis"), 
        list(ConditionMeshId = "D000003251", ConditionMeshTerm = "Constriction, Pathologic"))), 
        ConditionAncestorList = list(ConditionAncestor = list(
            list(ConditionAncestorId = "D000020763", ConditionAncestorTerm = "Pathological Conditions, Anatomical"), 
            list(ConditionAncestorId = "D000013122", ConditionAncestorTerm = "Spinal Diseases"), 
            list(ConditionAncestorId = "D000001847", ConditionAncestorTerm = "Bone Diseases"), 
            list(ConditionAncestorId = "D000009140", ConditionAncestorTerm = "Musculoskeletal Diseases"), 
            list(ConditionAncestorId = "D000013169", ConditionAncestorTerm = "Spondylolysis"), 
            list(ConditionAncestorId = "D000055009", ConditionAncestorTerm = "Spondylosis"))), 
        ConditionBrowseLeafList = list(ConditionBrowseLeaf = list(
            list(ConditionBrowseLeafId = "M26992", ConditionBrowseLeafName = "Intervertebral Disc Degeneration", 
                ConditionBrowseLeafAsFound = "Degenerative Disc Disease", 
                ConditionBrowseLeafRelevance = "high"), list(
                ConditionBrowseLeafId = "M14546", ConditionBrowseLeafName = "Spondylolisthesis", 
                ConditionBrowseLeafAsFound = "Spondylolisthesis", 
                ConditionBrowseLeafRelevance = "high"), list(
                ConditionBrowseLeafId = "M14510", ConditionBrowseLeafName = "Spinal Stenosis", 
                ConditionBrowseLeafAsFound = "Spinal Stenosis", 
                ConditionBrowseLeafRelevance = "high"), list(
                ConditionBrowseLeafId = "M5058", ConditionBrowseLeafName = "Constriction, Pathologic", 
                ConditionBrowseLeafAsFound = "Stenosis", ConditionBrowseLeafRelevance = "high"), 
            list(ConditionBrowseLeafId = "M21103", ConditionBrowseLeafName = "Pathological Conditions, Anatomical", 
                ConditionBrowseLeafRelevance = "low"), list(ConditionBrowseLeafId = "M14502", 
                ConditionBrowseLeafName = "Spinal Diseases", 
                ConditionBrowseLeafRelevance = "low"), list(ConditionBrowseLeafId = "M3708", 
                ConditionBrowseLeafName = "Bone Diseases", ConditionBrowseLeafRelevance = "low"), 
            list(ConditionBrowseLeafId = "M10680", ConditionBrowseLeafName = "Musculoskeletal Diseases", 
                ConditionBrowseLeafRelevance = "low"), list(ConditionBrowseLeafId = "M14547", 
                ConditionBrowseLeafName = "Spondylolysis", ConditionBrowseLeafRelevance = "low"), 
            list(ConditionBrowseLeafId = "M26580", ConditionBrowseLeafName = "Spondylosis", 
                ConditionBrowseLeafRelevance = "low"), list(ConditionBrowseLeafId = "T6038", 
                ConditionBrowseLeafName = "Quality of Life", 
                ConditionBrowseLeafRelevance = "low"))), ConditionBrowseBranchList = list(
            ConditionBrowseBranch = list(list(ConditionBrowseBranchAbbrev = "BC05", 
                ConditionBrowseBranchName = "Muscle, Bone, and Cartilage Diseases"), 
                list(ConditionBrowseBranchAbbrev = "All", ConditionBrowseBranchName = "All Conditions"), 
                list(ConditionBrowseBranchAbbrev = "BC23", ConditionBrowseBranchName = "Symptoms and General Pathology"), 
                list(ConditionBrowseBranchAbbrev = "BXM", ConditionBrowseBranchName = "Behaviors and Mental Disorders"))))))))

我已經取得的

我可以輕松地將一批 JSON 文件讀取到此處所述的列表中( x= vector with paths to the files

library(parallel)
library(jsonlite) 
    cl <- makeCluster(detectCores() - 1)
    json_list<-parLapply(cl,paths$path,function(x) jsonlite::fromJSON(x))
    stopCluster(cl)

我試過的

我在jsonlite::fromJSON中嘗試了選項 simpleDatFrame simplifyDatFrame = T ,但是,我收到以下錯誤消息:

1: In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  row names were found from a short variable and have been discarded
2: In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  row names were found from a short variable and have been discarded

我嘗試了一個建議的解決方案( how-to-get-data-out-of-nested-xml-structure ),用於直接使用clinicaltrials.gov 的web API 生成的嵌套列表。

as_tibble(test$FullStudy$Study)
Error: Tibble columns must have compatible sizes.
* Size 2: Column `DerivedSection`.
* Size 11: Column `ProtocolSection`.
i Only values of size one are recycled.

我嘗試使用tidyjson ,但是,我無法從嵌套列表中獲取 tidy data.frame 。

package tidyjson完美運行:

直接使用 tidyjson::read_json 讀取 JSON 文件以獲得正確的格式( tbl_json (S3: tbl_json/tbl_df/tbl/data.frame)進行進一步處理非常重要。

#library
library(tidyjson)

# load the JSON file
tidyjson::read_json("NCT0455805.json") -> test

# check the data structure
str(test)
tbl_json [1 x 2] (S3: tbl_json/tbl_df/tbl/data.frame)

# make a tibble
test %>% tidyjson::spread_all()

> # A tibble: 1 x 42   ..JSON document.id FullStudy.Rank FullStudy.Study~ FullStudy.Study~ FullStudy.Study~ FullStudy.Study~
> FullStudy.Study~ FullStudy.Study~ FullStudy.Study~   <chr>       
> <int>          <dbl> <chr>            <chr>            <chr>          
> <chr>            <chr>            <chr>            <chr>            1
> "{\"F~           1         254369 NCT01455805      Minuteman Spina~
> Efficacy and Qu~ October 2020     Active, not rec~ October 13, 2011
> October 18, 2011

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM