I'm trying to convert a CSV that looks like the 1st example into one that looks like the 2nd example below.
I've been playing with Pandas and think I have the fundamentals working, but I can't seem to figure out how to do one last transformation (from my placeholder value in the pivot to an actual English word) .
In the code below, the piece I need help with is the comment that says " I need to figure out something I can put here that will replace any non-null value found in the cells of column pivottally[c] with the string 'registered' ."
Note - if you suggest a more efficient way to go through the data than a for loop over a list of column names, feel free. The for loop was just a way to test functionality as I use Pandas for the first time.
Input:
First Last Email Program
john doe jd@me.com BasketWeaving
jane doe dj@me.com BasketWeaving
jane doe dj@me.com Acrobatics
jane doe dj@me.com BasketWeaving
mick jag mj@me.com StageDiving
Desired output:
First Last Email StatusBasketWeaving__c StatusAcrobatics__c StatusStageDiving__c
john doe jd@me.com registered
jane doe dj@me.com registered registered
mick jag mj@me.com registered
(there's actually one more column my code inserts, but it'd make this example too wide, so it's not shown here.)
Here's what I've written so far:
import pandas
import numpy
# Read in the First Name, Last Name, Email Address, & "Program Registered For" columns of a log file of registrations conducted that day.
tally = pandas.read_csv('tally.csv', names=['First', 'Last', 'Email', 'Program'])
# Rename the First Name & Last Name columns so that they're Salesforce Contact object field names
tally.rename(columns={'First':'FirstName', 'Last':'LastName'}, inplace=True)
# Create a concatenation of First, Last, & Email that can be used for later Excel-based VLOOKUP-ing Salesforce Contact Ids from a daily export of Id+Calculated_Lastname_Firstname_Email from Salesforce
tally['Calculated_Lastname_Firstname_Email__c'] = tally['LastName'] + tally['FirstName'] + tally['Email']
# Rename the values in Program so that they're ready to become field names for the Salesforce Contact object
tally['Program'] = 'Status' + tally['Program'] + '__c'
# Pivot the data by grouping on First+Last+Email+(Concatenated), listing the old registered-for-Program values as column headings, and putting
# a non-null value under that column heading if the person has any rows indicating that they registered for it.
pivottally = pandas.pivot_table(tally, rows=['FirstName', 'LastName', 'Email', 'Calculated_Lastname_Firstname_Email__c'], cols='Program', aggfunc=numpy.size)
# Grab a list of column names that have to do with the programs themselves (these are where we'll want to replace our non-null placeholder with 'Registered')
statuscolumns = [s for s in (list(pivottally.columns.values)) if s.startswith('Status')]
for c in statuscolumns:
#pivottally.rename(columns={c:'Hi'+c}, inplace=True) # Just a test line to make sure my for loop worked.
# I need to figure out something I can put here that will replace any non-null value found in the cells of column pivottally[c] with the string 'Registered'
print(pivottally.head())
#pivottally.to_csv('pivottally.csv')
Thanks for all your help.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.