简体   繁体   中英

Flatten complex nested JSON structure in Pyspark

In Databricks, using Pyspark, I am getting a json response from a request that has the following structure. This is an example that I have cut, since the response is quite large.

For reference, this is the request that returns the whole response:

requests.get("https://prod-noblehire-api-000001.appspot.com/job?").json()

{'elements': [{'id': 1615,
   'slug': 'engineering-manager-ios-blockchain',
   'title': 'Engineering Manager, iOS - Blockchain',
   'seniority': 'expert',
   'role': 'engineeringOthers',
   'primaryLanguage': 'java',
   'primaryPlatform': None,
   'secondaryLanguage': None,
   'secondaryPlatform': None,
   'mainDatabase': None,
   'description': '<p>Stand out and lead the way. \u200b\u200bDefine an industry. Forge the path to truly blockchain-based, deflationary, and limitless finance.</p>\n<p><br></p>\n<p><a href="https://nexo.io/" rel="nofollow">Nexo</a>&nbsp;is the world’s leading regulated digital assets institution. Our mission is to maximize the value and utility of cryptocurrencies by signature products created in Bulgaria. For four years now, we have processed $80+ billion for over 5,000,000+ users around the globe. And this is only the beginning.</p>\n<p><br></p>\n<p>We are lean, and our approach is a dream come true — no legacy, no monolithic structures. This is how we unlock a real creative powerhouse in development.</p>',
   'productDescription': '',
   'postedAt': 1662976098357,
   'companyId': 1,
   'salaryCurrency': 'BGN',
   'salaryMin': 11500,
   'salaryMax': 15000,
   'salaryPeriod': 'MONTH',
   'jobType': 'FULL_TIME',
   'jobTypeComment': None,
   'businessTravelComment': None,
   'homeOfficeDays': 0,
   'homeOfficePer': 'WEEK',
   'teamSizeMin': 500,
   'teamSizeMax': 1000,
   'teamLead': '',
   'teamLeadName': '',
   'teamLeadRole': '',
   'teamLeadImage': None,
   'locations': [{'id': 1,
     'address': '{"address_components":[{"long_name":"Sofia","short_name":"Sofia","types":["locality","political"]},{"long_name":"Sofia City Province","short_name":"Sofia City Province","types":["administrative_area_level_1","political"]},{"long_name":"Bulgaria","short_name":"BG","types":["country","political"]}],"formatted_address":"Sofia, Bulgaria","geometry":{"bounds":{"northeast":{"lat":42.7877752,"lng":23.4569049},"southwest":{"lat":42.6030891,"lng":23.1909885}},"location":{"lat":42.6977082,"lng":23.3218675},"location_type":"APPROXIMATE","viewport":{"northeast":{"lat":42.7877752,"lng":23.4569049},"southwest":{"lat":42.6030891,"lng":23.1909885}}},"place_id":"ChIJ9Xsxy4KGqkARYF6_aRKgAAQ","types":["locality","political"]}',
     'teamSize': None,
     'founded': 2018,
     'comment': None}],
   'productImages': [{'id': 5078,
     'collection': 'product',
     'name': 'prod/1/product/zm49c..'},
    {'id': 5079, 'collection': 'product', 'name': 'prod/1/product/mhqkiw.'},
    {'id': 5081, 'collection': 'product', 'name': 'prod/1/product/c.jtpo.'},
    {'id': 5080, 'collection': 'product', 'name': 'prod/1/product/u0847l.'},
    {'id': 5082, 'collection': 'product', 'name': 'prod/1/product/yrpy99.'}],
   'requirements': [{'icon': 'fas fa-check-square',
     'title': '1+ years of experience leading a team of 4+ people'},
    {'icon': 'fas fa-check-square',
     'title': '3+ years of experience with Swift'},
    {'icon': 'fas fa-check-square',
     'title': 'Experience with multithreaded apps, offline storage and performance optimizations'},
    {'icon': 'fas fa-check-square', 'title': 'Experience using RESTful APIs'},
    {'icon': 'fas fa-check-square',
     'title': "Familiar with Apple's Human Interface Guidelines and concepts for good UX"},
    {'icon': 'fas fa-check-square',
     'title': 'Experience setting up CI/CD processes'},
    {'icon': 'fas fa-check-square',
     'title': 'Good understanding of design patterns and software architecture principles'},
    {'icon': 'fas fa-check-square',
     'title': 'Understanding of software development lifecycle (SDLC)'},
    {'icon': 'fas fa-check-square',
     'title': 'Superior interpersonal communication skills'},
    {'icon': 'fas fa-check-square',
     'title': 'A technical mindset with great attention to detail'},
    {'icon': 'fas fa-check-square',
     'title': 'Self-starter, comfortable in a fast-paced startup environment'},
    {'icon': 'fas fa-check-square',
     'title': 'Familiar with crypto-economic protocol design including governance and incentive structures'},
    {'icon': 'fas fa-check-square',
     'title': 'Familiar with Web3 Stack: Web3, Ethers, Hardhat'}],
   'responsibilities': [{'icon': 'fas fa-hand-point-right',
     'title': 'Take part in designing, scoping, developing, and maintaining disruptive Web3 applications'},
    {'icon': 'fas fa-hand-point-right',
     'title': 'You Write It – You Own It: Develop, test, release, maintain, and improve – own the full life-cycle of your code'},
    {'icon': 'fas fa-hand-point-right',
     'title': 'Provide technical guidance and coaching'},
    {'icon': 'fas fa-hand-point-right',
     'title': 'Possess a strong sense of ownership and accountability with a commitment to deliver quality timely outcomes\n'},
    {'icon': 'fas fa-hand-point-right',
     'title': 'Assist in management, including hiring, onboarding, training, and keeping management updated on team performance'},
    {'icon': 'fas fa-hand-point-right',
     'title': 'Proactively propose improvement plans and best practices'},
    {'icon': 'fas fa-hand-point-right',
     'title': 'Contribute and maintain the thorough documentation'}],
   'benefits': [{'icon': 'fas fa-star',
     'title': 'Attractive remuneration package'},
    {'icon': 'fas fa-star', 'title': 'Annual bonuses '},
    {'icon': 'fas fa-star', 'title': 'Learning Hub'},
    {'icon': 'fas fa-star', 'title': 'Hybrid way of work and work from home '},
    {'icon': 'fas fa-star',
     'title': 'Inspiring atmosphere and innovative projects'},
    {'icon': 'fas fa-star',
     'title': 'Future career development in a global company leading the innovative blockchain space'},
    {'icon': 'fas fa-star',
     'title': 'Tailor-made personal benefits package — you want it, you get it'},
    {'icon': 'fas fa-star',
     'title': 'Wellness program including additional health insurance, Multisport card, sports activities, standing desks & protective glasses and many more'},
    {'icon': 'fas fa-star',
     'title': 'Free assorted healthy snacks and fresh fruits in the office'},
    {'icon': 'fas fa-star', 'title': 'Weekly gourmet breakfast meet-ups'},
    {'icon': 'fas fa-star',
     'title': 'Free parking with a designated space, free electric bikes & public transport'},
    {'icon': 'fas fa-star',
     'title': 'Epic regular team buildings and parties'}],
   'activities': [],
   'hiringProcessSteps': [],
   'tools': [],
   'company': {'id': 1,
    'slug': 'nexo',
    'brand': 'Nexo',
    'overview': '<p><a href="https://nexo.io/" rel="nofollow">Nexo</a>&nbsp;is the world’s leading regulated digital assets institution. Our mission is to maximize the value and utility of cryptocurrencies by signature products created in Bulgaria. For three years now, we have processed $80+ billion for over 5,000,000+ users around the globe. And this is only the beginning.</p>',
    'product': '',
    'images': [{'id': 2,
      'collection': 'main',
      'name': 'prod/1/main/yzxiiu.jpg'},
     {'id': 5071, 'collection': 'main', 'name': 'prod/1/main/c3smv4.'},
     {'id': 5072, 'collection': 'main', 'name': 'prod/1/main/baofyo.'},
     {'id': 1, 'collection': 'logo', 'name': 'prod/1/logo/b25jcy.png'},
     {'id': 5078, 'collection': 'product', 'name': 'prod/1/product/zm49c..'},
     {'id': 5079, 'collection': 'product', 'name': 'prod/1/product/mhqkiw.'},
     {'id': 5081, 'collection': 'product', 'name': 'prod/1/product/c.jtpo.'},
     {'id': 5080, 'collection': 'product', 'name': 'prod/1/product/u0847l.'},
     {'id': 5082, 'collection': 'product', 'name': 'prod/1/product/yrpy99.'},
     {'id': 21, 'collection': 'clients', 'name': 'prod/1/clients/cw-ck4.jpg'},
     {'id': 22, 'collection': 'clients', 'name': 'prod/1/clients/wpezkk.jpg'},
     {'id': 23, 'collection': 'clients', 'name': 'prod/1/clients/01ngdh.jpg'},
     {'id': 24, 'collection': 'clients', 'name': 'prod/1/clients/yh1pgp.jpg'},
     {'id': 25, 'collection': 'clients', 'name': 'prod/1/clients/gli-f8.jpg'},
     {'id': 5073, 'collection': 'photos', 'name': 'prod/1/photos/zunc4z.'},
     {'id': 5074, 'collection': 'photos', 'name': 'prod/1/photos/q-bklt.'},
     {'id': 5075, 'collection': 'photos', 'name': 'prod/1/photos/tgiq9i.'},
     {'id': 5077, 'collection': 'photos', 'name': 'prod/1/photos/dguwy5.'},
     {'id': 5076, 'collection': 'photos', 'name': 'prod/1/photos/tx0fdl.'},
     {'id': 3871, 'collection': 'photos', 'name': 'prod/1/photos/c0nnnz.'},
     {'id': 3869, 'collection': 'photos', 'name': 'prod/1/photos/6x4jxe.'},
     {'id': 3874, 'collection': 'photos', 'name': 'prod/1/photos/e0bfnd.'},
     {'id': 3872, 'collection': 'photos', 'name': 'prod/1/photos/k5q7hf.'}],
    'locations': [{'id': 1,
      'address': '{"address_components":[{"long_name":"Sofia","short_name":"Sofia","types":["locality","political"]},{"long_name":"Sofia City Province","short_name":"Sofia City Province","types":["administrative_area_level_1","political"]},{"long_name":"Bulgaria","short_name":"BG","types":["country","political"]}],"formatted_address":"Sofia, Bulgaria","geometry":{"bounds":{"northeast":{"lat":42.7877752,"lng":23.4569049},"southwest":{"lat":42.6030891,"lng":23.1909885}},"location":{"lat":42.6977082,"lng":23.3218675},"location_type":"APPROXIMATE","viewport":{"northeast":{"lat":42.7877752,"lng":23.4569049},"southwest":{"lat":42.6030891,"lng":23.1909885}}},"place_id":"ChIJ9Xsxy4KGqkARYF6_aRKgAAQ","types":["locality","political"]}',
      'teamSize': None,
      'founded': 2018,
      'comment': None}],
    'awards': [{'icon': 'heart',
      'title': 'Finance Company of The Year by Forbes Business Awards 2021'},
     {'icon': 'heart',
      'title': 'Bronze Stevie® Award In 2021 International Business Awards®'},
     {'icon': 'fas fa-trophy',
      'title': 'Top 50 List of Crypto Valley Companies'}],
    'perks': [{'icon': 'fas fa-gem',
      'title': 'Monthly evening events',
      'text': 'At Nexo, we do monthly get-togethers after work to have a drink, enjoy some company and relax. No rules, no expectations.'},
     {'icon': 'fas fa-gem',
      'title': 'Teambuildings',
      'text': 'Twice a year our team travels out of the big city to cozy locations where we can relax, have fun, sing karaoke, and do sports.'},
     {'icon': 'fas fa-gem',
      'title': 'Lunchtime',
      'text': 'This is the easiest opportunity to get connected with colleagues outside of your immediate team. Learn about them personally and professionally.'},
     {'icon': 'fas fa-gem',
      'title': 'Ad hoc',
      'text': 'Going in the park, the gym, an event, or elsewhere is always welcome. Share with your colleagues and you will likely find enthusiasts.'}],
    'values': [{'icon': 'fas fa-gem',
      'title': 'Give Your Best',
      'text': 'Always. When every team member gives his/her best in their domain, we as a team move mountains.'},
     {'icon': 'fas fa-gem',
      'title': 'Results Matter the Most',
      'text': 'We analyze what is best for the company and how can we invest our time in the most productive way to generate the most impactful results.'},
     {'icon': 'fas fa-gem',
      'title': 'Bring Positivity',
      'text': 'We smile, have fun, and enjoy ourselves.'},
     {'icon': 'fas fa-gem',
      'title': 'Be Proactive',
      'text': 'We are independent, forward-thinking and we act.'},
     {'icon': 'fas fa-gem',
      'title': 'Always Learn. Always Question.',
      'text': 'Our industry changes 10X faster than most other industries. Hence, we are always learning and looking for ways to stay ahead of the competition.'}],
    'public': True},
   'public': True,
   'customerFacing': False,
   'businessTraveling': False,
   'offeringStock': False,
   'fullyRemote': False},
  {'id': 1606,
   'slug': 'devops-engineer',
   'title': 'DevOps Engineer',
   'seniority': 'senior',
   'role': 'devops',
   'primaryLanguage': 'bashShellPowershell',
   'primaryPlatform': 'linux',
   'secondaryLanguage': 'python',
   'secondaryPlatform': None,
   'mainDatabase': 'postgresql',
   'description': "<p>Are you passionate about making things happen?</p><p><br></p><p>Do you want to help great code to run with the level of perfection clients need for software which decides on millions and billions of dollars?</p><p><br></p><p>About you and your day to day responsibilities:</p><ul><li>You have 2+ years of experience in a systems engineering/DevOps role</li><li>You have industry experience with Azure Cloud, Azure DevOps, Git, Infrastructure as Code</li><li>You have experience with Docker, Kubernetes and Helm in a production environment</li><li>Experience designing and developing CI/CD pipelines (Jenkins/AzureDevOps)</li><li>Linux at the admin level</li><li>Databases at admin level (PostgreSQL)</li><li>Familiarity with Scripting Languages (Bash/Python/Go/Groovy)</li><li>Monitoring (Dynatrace/Azure AppInsights)</li><li>Logging (ELK / Graylog / Azure Log Analytics)</li><li>Networking (TCP / IP, Firewall)</li></ul><p><br></p><ul><li>You will actively interface with software developers, security operations engineers, product managers, operations managers on projects</li><li>You will perform capacity planning, automation, testing, performance tuning, and tools development</li><li>You will develop and maintain the continuous integration and continuous delivery pipeline</li><li>You will develop and deploy a control plane for all platform services to guarantee observability, monitoring, analytics, and alerting</li><li>You will provide on-call support for the platform.</li><li>You will collaborate with the cyber security team to integrate security measures into all aspects of the platform</li><li>You will work with technical project managers, product managers, and operations managers to set priorities and track operational metrics</li><li>You will participate in planning, system demos, and inspect and adapt events</li><li>You will drive and coordinate platform adoption, actively engaging product development, quality, regulatory, and customer success teams</li></ul><p><br></p><p>Why we are great:</p><p><br></p><ul><li>We have a friendly team, which is nice to work in - people have always been our strong point. Solveva has a flat structure, little bureaucracy, no managers (any scrum-master, first of all, the developerÐ</li><li>We provide you with a choice of hardware: Windows (Dell, Lenovo) or Mac (Pro, Air)</li><li>Remote, office or mixed work. If you do not feel comfortable working from home, we'll take for you a coworking space</li><li>You are given an individual budget for the development of knowledge (books, conferences, courses), additional equipment and putting the body in order after hard work</li><li>Additional benefits and incentives</li></ul><p><br></p><p>About Solveva:</p><p><br></p><p>We provide expertise in insurance and software engineering for the best of our clients.</p><p>Solveva is an owner-operated company headquartered in Zurich, Switzerland, with offices in Russia and Bulgaria. We are an agile software engineering firm for core insurance processes like pricing, underwriting, and risk management. Clients can either develop tailor-made software together with us or can subscribe to our self-developed solutions which we sell under our Actus brand.</p><p>We offer a high level of personal responsibility with room for individual development and personal growth in an open and trustful corporate culture. We value passion for development and strive to create an atmosphere of trust and freedom of action in the team.</p>",
   'productDescription': '<p><a href="https://solveva.com/#projects" rel="nofollow"><em>Actus</em></a></p>',
   'postedAt': 1662111811347,
   'companyId': 198,
   'salaryCurrency': 'BGN',
   'salaryMin': 4000,
   'salaryMax': 0,
   'salaryPeriod': 'MONTH',
   'jobType': 'FULL_TIME',
   'jobTypeComment': None,
   'businessTravelComment': None,
   'homeOfficeDays': 0,
   'homeOfficePer': 'WEEK',
   'teamSizeMin': 1,
   'teamSizeMax': 5,
   'teamLead': '',
   'teamLeadName': '',
   'teamLeadRole': '',
   'teamLeadImage': None,
   'locations': [{'id': 1425,
     'address': '{"address_components":[{"long_name":"Sofia","short_name":"Sofia","types":["locality","political"]},{"long_name":"Sofia City Province","short_name":"Sofia City Province","types":["administrative_area_level_1","political"]},{"long_name":"Bulgaria","short_name":"BG","types":["country","political"]}],"formatted_address":"Sofia, Bulgaria","geometry":{"bounds":{"south":42.6030891,"west":23.1909885,"north":42.7877752,"east":23.4569049},"location":{"lat":42.6977082,"lng":23.3218675},"location_type":"APPROXIMATE","viewport":{"south":42.6030891,"west":23.1909885,"north":42.7877752,"east":23.4569049}},"place_id":"ChIJ9Xsxy4KGqkARYF6_aRKgAAQ","types":["locality","political"]}',
     'teamSize': None,
     'founded': None,
     'comment': None}],
   'productImages': [],
   'requirements': [{'icon': 'fas fa-check-square',
     'title': 'You have 2+ years of experience in a systems engineering/DevOps role\n\n\n'}],
   'responsibilities': [{'icon': 'fas fa-hand-point-right',
     'title': 'You will actively interface with software developers, security operations engineers, product managers, operations managers on projects\n'},
    {'icon': 'fas fa-hand-point-right',
     'title': 'You will provide on-call support for the platform.\nYou will collaborate with the cyber security team to integrate security measures into all aspects of the platform'},
    {'icon': 'fas fa-hand-point-right',
     'title': 'You will develop and deploy a control plane for all platform services to guarantee observability, monitoring, analytics, and alerting'},
    {'icon': 'fas fa-hand-point-right',
     'title': 'You will develop and maintain the continuous integration and continuous delivery pipeline'},
    {'icon': 'fas fa-hand-point-right',
     'title': ' You will perform capacity planning, automation, testing, performance tuning, and tools development'}],
   'benefits': [{'icon': 'fas fa-star',
     'title': 'We have a friendly team, which is nice to work in - people have always been our strong point. Solveva has a flat structure, little bureaucracy, no managers (any scrum-master, first of all, the developer);\n'},
    {'icon': 'fas fa-star',
     'title': 'We provide you with a choice of hardware: Windows (Dell, Lenovo) or Mac (Pro, Air);'},
    {'icon': 'fas fa-star',
     'title': "Remote, office or mixed work. If you do not feel comfortable working from home, we'll take for you a coworking space;"},
    {'icon': 'fas fa-star',
     'title': 'We provide you with a choice of hardware: Windows (Dell, Lenovo) or Mac (Pro, Air);'},
    {'icon': 'fas fa-star',
     'title': 'An individual budget for the development of knowledge (books, conferences, courses), additional equipment and putting the body in order after hard work;'},
    {'icon': 'fas fa-star', 'title': 'Additional benefits and incentives;'}],
   'activities': [],
   'hiringProcessSteps': ['CV review\n',
    'HR Interview',
    'Technical Interview',
    'Task ( this is optional - we do it if is needed )',
    'Team Interview'],
   'tools': [],
   'company': {'id': 198,
    'slug': 'solveva',
    'brand': 'Solveva ',
    'overview': '<p>We provide expertise in insurance and software engineering for the best of our clients.&nbsp;</p>\n<p>Solveva is an owner-operated company headquartered in Zurich, Switzerland, with offices in Georgia and Bulgaria.</p>\n<p>We are an agile software engineering firm for core insurance processes like pricing, underwriting, and risk management. Clients can either develop tailor-made software together with us or can subscribe to our self-developed solutions which we sell under our Actus brand.</p>\n<p>We are committed to create the most value for our clients. This is only possible with highly motivated employees who enjoy working in our firm. Thus, we put our employees first.</p>\n<p>We offer a high level of personal responsibility with room for individual development and personal growth in an open and trustful corporate culture.</p>',
    'product': '<p><a href="https://solveva.com/#projects" rel="nofollow"><em>Actus</em></a></p>',
    'images': [{'id': 5709,
      'collection': 'main',
      'name': 'prod/198/main/plhttq.'},
     {'id': 5710, 'collection': 'logo', 'name': 'prod/198/logo/ywhgct.'},
     {'id': 5711, 'collection': 'logo', 'name': 'prod/198/logo/mhpi0w.'},
     {'id': 5718, 'collection': 'photos', 'name': 'prod/198/photos/h.mk-f.'},
     {'id': 5717, 'collection': 'photos', 'name': 'prod/198/photos/y4xgtc.'},
     {'id': 5715, 'collection': 'photos', 'name': 'prod/198/photos/zukwmr.'},
     {'id': 5719, 'collection': 'photos', 'name': 'prod/198/photos/tshc3z.'}],
    'locations': [{'id': 1425,
      'address': '{"address_components":[{"long_name":"Sofia","short_name":"Sofia","types":["locality","political"]},{"long_name":"Sofia City Province","short_name":"Sofia City Province","types":["administrative_area_level_1","political"]},{"long_name":"Bulgaria","short_name":"BG","types":["country","political"]}],"formatted_address":"Sofia, Bulgaria","geometry":{"bounds":{"south":42.6030891,"west":23.1909885,"north":42.7877752,"east":23.4569049},"location":{"lat":42.6977082,"lng":23.3218675},"location_type":"APPROXIMATE","viewport":{"south":42.6030891,"west":23.1909885,"north":42.7877752,"east":23.4569049}},"place_id":"ChIJ9Xsxy4KGqkARYF6_aRKgAAQ","types":["locality","political"]}',
      'teamSize': None,
      'founded': None,
      'comment': None},
     {'id': 1426,
      'address': '{"address_components":[{"long_name":"Tbilisi","short_name":"Tbilisi","types":["locality","political"]},{"long_name":"Didi digomi","short_name":"Didi digomi","types":["administrative_area_level_2","political"]},{"long_name":"Tbilisi","short_name":"Tbilisi","types":["administrative_area_level_1","political"]},{"long_name":"Georgia","short_name":"GE","types":["country","political"]}],"formatted_address":"Tbilisi, Georgia","geometry":{"bounds":{"south":41.6210248,"west":44.6600246,"north":41.8438937,"east":45.0176811},"location":{"lat":41.7151377,"lng":44.827096},"location_type":"APPROXIMATE","viewport":{"south":41.6210248,"west":44.6600246,"north":41.8438937,"east":45.0176811}},"place_id":"ChIJa2JP5tcMREARo25X4u2E0GE","types":["locality","political"]}',
      'teamSize': None,
      'founded': None,
      'comment': None},
     {'id': 1427,
      'address': '{"address_components":[{"long_name":"Zürich","short_name":"Zürich","types":["locality","political"]},{"long_name":"Zürich District","short_name":"Zürich District","types":["administrative_area_level_2","political"]},{"long_name":"Zurich","short_name":"ZH","types":["administrative_area_level_1","political"]},{"long_name":"Switzerland","short_name":"CH","types":["country","political"]}],"formatted_address":"Zürich, Switzerland","geometry":{"bounds":{"south":47.32023,"west":8.448059899999999,"north":47.43468,"east":8.6253701},"location":{"lat":47.3768866,"lng":8.541694},"location_type":"APPROXIMATE","viewport":{"south":47.32023,"west":8.448059899999999,"north":47.43468,"east":8.6253701}},"place_id":"ChIJGaK-SZcLkEcRA9wf5_GNbuY","types":["locality","political"]}',
      'teamSize': None,
      'founded': None,
      'comment': None},
     {'id': 1428,
      'address': '{"address_components":[{"long_name":"Lisbon","short_name":"Lisbon","types":["locality","political"]},{"long_name":"Lisbon","short_name":"Lisbon","types":["administrative_area_level_1","political"]},{"long_name":"Portugal","short_name":"PT","types":["country","political"]}],"formatted_address":"Lisbon, Portugal","geometry":{"bounds":{"south":38.6913994,"west":-9.2298356,"north":38.7958538,"east":-9.0905709},"location":{"lat":38.7222524,"lng":-9.1393366},"location_type":"APPROXIMATE","viewport":{"south":38.6913994,"west":-9.2298356,"north":38.7958538,"east":-9.0905709}},"place_id":"ChIJO_PkYRozGQ0R0DaQ5L3rAAQ","types":["locality","political"]}',
      'teamSize': None,
      'founded': None,
      'comment': None}],
    'awards': [],
    'perks': [{'icon': 'heart',
      'title': 'Internal trainings',
      'text': 'As our main value are the people, at Solveva we regularly organise different  trainings to help our colleagues achieve their career and personal goals.'},
     {'icon': 'heart',
      'title': 'Good social benefit system',
      'text': 'You are given an individual budget for the development of knowledge (books, conferences, courses), additional equipment and putting the body in order after hard work.'},
     {'icon': 'heart',
      'title': 'Regular Planing meetings and team buildings',
      'text': 'Every 3 months we organise a planing meetings and team building event and gather all our colleagues for a 5-day  where we can spend quality time together.'},
     {'icon': 'heart',
      'title': 'Flexible working time',
      'text': "No fixed schedules, ability for remote working days.\nRemote, office or mixed work. If you do not feel comfortable working from home, we'll take for you a coworking space."},
    'public': True},
   'public': True,
   'customerFacing': False,
   'businessTraveling': False,
   'offeringStock': False,
   'fullyRemote': False}],
 'totalPages': 10,
 'page': 0,
 'totalElements': 481}

My goal is to read the response into a DataFrame and flatten all nested columns.

Currently, I am reading it into a DataFrame using the following code:

import json
jsonData = json.dumps(data)

jsonDataList = []
jsonDataList.append(jsonData)

jsonRDD = sc.parallelize(jsonDataList)
df = spark.read.json(jsonRDD)
display(df)

This works well (for most of the columns) and returns almost 700 columns. However for the "location" columns I am getting the following result:

在此处输入图像描述

在此处输入图像描述

There are multiple location columns with identical complex nested structure. The problem is that the column data type is a string, so I can't unpack it further. I might be getting it wrong from the beginning, using the above code. Is there a better way to deal with this problem?

  • As the column address is a String type (dictionary as string), it might not have been unpacked as the rest.

  • In order to unpack this column, you can use json_tuple() provided by pyspark.sql.functions . Look at the following code.

  • I have taken a sample dataframe only with the values of address. The following is the schema of that dataframe.

root
 |-- id: long (nullable = true)
 |-- address: string (nullable = true)

#id column only for demonstration. 

在此处输入图像描述

  • Now you can use json_tuple() to create seperate columns for each of the key inside address object.
d2 = df.select(col("id"),json_tuple(col("address"),"address_components","formatted_address","geometry","place_id","types")) \
    .toDF("id","address_components","formatted_address","geometry","place_id","types")
d2.printSchema()


#schema is
root
 |-- id: long (nullable = true)
 |-- address_components: string (nullable = true)
 |-- formatted_address: string (nullable = true)
 |-- geometry: string (nullable = true)
 |-- place_id: string (nullable = true)
 |-- types: string (nullable = true)

在此处输入图像描述

  • If you want to further parse each of the row, you can use many necessary functions provided by pyspark.sql.functions to transform data as required.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM