简体   繁体   中英

How to convert PySpark.rdd.RDD to JSON?

I have a very huge data set and I use Spark. The file is JSON. The first line is:

{"originaltitle":"Sales Representative / Home Agent","workexperiences":[{"company":"TelAffects\\\\ Carlise Group Temp Position","country":"US","customizeddaterange":"4 months","daterange":{"displaydaterange":"October 2018 to February 2019","startdate":{"displaydate":"October 2018","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"February 2019","granularity":"MONTH","isodate":{"date":null}}},"description":"Outbound Sales Representative/ Temp positionresponsible for the sale of health care program from Medishare. Contact customer present health care program. Pre-qualify customer.a0Explained to prospect the benefits of taking Medishare instead of traditional insurance.a0Took application fee and instructed prospect in filling out forms necessary to become a Medishare member. One week follow up take their first months payment.","location":"Lake Mary, FL","normalizedtitle":"customer service/sales representative","title":"Customer Service Sales Representative"},{"company":"Think Direct Marketing Group","country":"US","customizeddaterange":"5 years, 3 months","daterange":{"displaydaterange":"October 2012 to 2018","startdate":{"displaydate":"October 2012","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"2018","granularity":"YEAR","isodate":{"date":null}}},"description":"Performance based position. Very fast paced sale of nationally published magazines. Had to maintain a 40% call to sale ratio. Took over 100 calls per shift.Awarded daily with points for high sales.","location":"Clearwater, FL","normalizedtitle":"sales agent","title":"Sales Representative / Home Agent"},{"company":"Sapphire Financial Services","country":"US","customizeddaterange":"1 year, 7 months","daterange":{"displaydaterange":"February 2009 to September 2010","startdate":{"displaydate":"February 2009","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"September 2010","granularity":"MONTH","isodate":{"date":null}}},"description":"Called on consumer\'s to prequalify eligibility for a credit card loan reduction by verifying the consumer\'s account was open and in good standing and they were at least $5000 in total card debt, but not near or over the credit limit.Informed clients of live negotiation process to receive low fixed rates on their credit card loans.","location":"Maitland, FL","normalizedtitle":"sales representative","title":"Sales Representative"},{"company":"Comp US Pay roll Services","country":"US","customizeddaterange":"1 year, 7 months","daterange":{"displaydaterange":"January 2007 to August 2008","startdate":{"displaydate":"January 2007","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"August 2008","granularity":"MONTH","isodate":{"date":null}}},"description":"Marketing Representative responsible for the Orlando Region.Called on business owners and CEOs of major corporations and outlined cost of payroll management based on the number of employees. Explained the benefits of payroll services as appose to internal accounting.Performed outside sales and cold calling with Insurance agents to aid in the sales of  workman\'s compensation and payroll packages.","location":"Orlando, FL","normalizedtitle":"marketing representative","title":"Marketing Representative"},{"company":"Dial America Marketing","country":"US","customizeddaterange":"1 year, 11 months","daterange":{"displaydaterange":"December 2004 to November 2006","startdate":{"displaydate":"December 2004","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"November 2006","granularity":"MONTH","isodate":{"date":null}}},"description":"Made outbound calls on businesses to promote the use of wireless products to improve efficiency in the work place. Developed and wrote sales scripts for sales representatives for training purposes. Oversaw and participated in client monitoring sessions to evaluate sales technique and performance.Results: Awarded consistently for high sales, Employee of the month 3months in a row.","location":"Winter Park, FL","normalizedtitle":"sales representative","title":"Sales Representative"},{"company":"Hewitt Associates","country":"US","customizeddaterange":"3 years, 6 months","daterange":{"displaydaterange":"April 2001 to October 2004","startdate":{"displaydate":"April 2001","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"October 2004","granularity":"MONTH","isodate":{"date":null}}},"description":"Worked with employers to update employee\'s benefits and retirement packages.Provided processing support for several client teams Quality Analyst by researching participant\'s accounts for input data and editing purposes.Prepared DBA reports for clients and wrote and created the  SOP for  Process Analyst role.Results: Assisted with special projects to provide Quality Analyst with timely documentation by updating clients accounts.","location":"Orlando, FL","normalizedtitle":"business process analyst","title":"Process Analyst"},{"company":"Thee Nail Center","country":"US","customizeddaterange":"9 years","daterange":{"displaydaterange":"March 1992 to March 2001","startdate":{"displaydate":"March 1992","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"March 2001","granularity":"MONTH","isodate":{"date":null}}},"description":"Owner and operator of a full service nail and skin salon. Managed a staff of  7 full time employees, built a steady repeat client base utilizing advertising, marketing and sales. Managed a small retail section of the store selling hair, skin care products and costume jewelry.Results: As business owner, built client base to retail section to profitable margins. As a result was able  to sell a manageable turn key business for a profit.","location":"Winter Park, FL","normalizedtitle":"business owner","title":"Business Owner"},{"company":"Friedman Jewelers","country":"US","customizeddaterange":"5 years, 2 months","daterange":{"displaydaterange":"October 1987 to December 1992","startdate":{"displaydate":"October 1987","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"December 1992","granularity":"MONTH","isodate":{"date":null}}},"description":"Approved customer loan applications, performed outside and inside collections by confronting delinquent customers and collecting money or merchandise.Results: Internal collections department kept delinquency to lowest rate in the division.","location":"Sanford, FL","normalizedtitle":"credit manager","title":"Credit Manager"}],"skillslist":[{"monthsofexperience":242,"text":"SALES"},{"monthsofexperience":128,"text":"MARKETING"},{"monthsofexperience":23,"text":"TRAINING"},{"monthsofexperience":19,"text":"PAYROLL"},{"monthsofexperience":19,"text":"SALES AND"},{"monthsofexperience":0,"text":"Call Center"},{"monthsofexperience":0,"text":"Customer Service"}],"url":"/r/Sharan-Hustead/0dbf8ca4a82b064e","additionalinfo":"Core CompetenciesCustomer Service ManagementTelemarketingProspecting/ Client CultivationTeam Building & TrainingComplaint Handling & ResolutionAccount DevelopmentBusiness OwnerSales and MarketingNegotiationsDebt CollectionPayroll ManagementInbound / Outbound Sales"}

How to convert PySpark.rdd.RDD to JSON?

You can read in the file using spark.read.json :

df = spark.read.json('test.json')

df.printSchema()
root
 |-- additionalinfo: string (nullable = true)
 |-- originaltitle: string (nullable = true)
 |-- skillslist: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- monthsofexperience: long (nullable = true)
 |    |    |-- text: string (nullable = true)
 |-- url: string (nullable = true)
 |-- workexperiences: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- company: string (nullable = true)
 |    |    |-- country: string (nullable = true)
 |    |    |-- customizeddaterange: string (nullable = true)
 |    |    |-- daterange: struct (nullable = true)
 |    |    |    |-- displaydaterange: string (nullable = true)
 |    |    |    |-- enddate: struct (nullable = true)
 |    |    |    |    |-- displaydate: string (nullable = true)
 |    |    |    |    |-- granularity: string (nullable = true)
 |    |    |    |    |-- isodate: struct (nullable = true)
 |    |    |    |    |    |-- date: string (nullable = true)
 |    |    |    |-- startdate: struct (nullable = true)
 |    |    |    |    |-- displaydate: string (nullable = true)
 |    |    |    |    |-- granularity: string (nullable = true)
 |    |    |    |    |-- isodate: struct (nullable = true)
 |    |    |    |    |    |-- date: string (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- location: string (nullable = true)
 |    |    |-- normalizedtitle: string (nullable = true)
 |    |    |-- title: string (nullable = true)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM