简体   繁体   中英

Unpivot data in new Dataframe

I have a sample dataframe as follows:

Back              Unamed: 2    Unamed: 3    Unamed: 4   Unamed: 5    Unamed: 6
Q5  
Text q5? 
                  Ireland       Poland       Spain        France      Chile        
Option 1            9%            56%          78%          23%        22%
Option 2            19%           44%          65%          33%        44%
Option 3            78%           88%          66%          54%        97%
Option 4            43%           32%          67%          23%        21%
Q6
Text q6?
                  Ireland       Poland       Spain        France      Chile  
option 1           39%            16%          38%          13%        22% 
Option 2           38%           82%          64%          54%        97%
Option 3           53%           12%          97%          13%        91%

Data in text:

{'back': {6: 'Q5',
  7: 'How long has the business been operating?',
  9: nan,
  11: 'Less than 1 year',
  12: '1 - 3 years',
  13: '4 - 10 years',
  14: 'More than 10 years',
  19: 'Q6',
  20: 'How many employees are regularly involved in running your business? These employees can be part-time, full-time, volunteers or informal/unregistered workers.',
  22: nan,
  24: '1 employee',
  25: '2 to 4 employees',
  26: '5 to 9 employees',
  31: 'Q7',
  32: 'Which sector does your company belong to?',
  34: nan,
  36: 'Catering - Hospitality, Events, etc',
  37: 'Retail - Online, Physical',
  38: 'Creative - Arts, Restoration, Media, etc',
  39: 'Production - Food, Drinks, Carpentry, Clothing etc',
  40: 'Personal - Healthcare, Beauty, Fitness, etc',
  41: 'Household Services - Delivery, Cleaning, Gardening, etc',
  42: 'Construction - Building, Electrics, Plumbing, etc',
  43: 'Motor Trade - Servicing, Tyres, Sales, etc',
  44: 'Education/ Social - Learning, Associations, etc',
  45: 'Professional - Technology, Legal, Financial, etc',
  46: 'Other Sector (please specify)',
  51: 'Q8',
  52: 'What challenges do you currently experience in your business?',
  55: nan,
  57: 'Challenges to attract new customers',
  58: 'Lack of funding',
  59: 'Standing out against competitors',
  60: 'Managing stress and burnout',
  61: 'Retaining customers',
  62: 'Lack of support',
  63: 'Difficulty managing business finances',
  64: 'Lack of knowledge on growing the business',
  65: 'Lack of premises',
  66: 'No access to digital skills training',
  67: 'Lack of cyber security',
  68: 'No access to business skills training',
  69: 'Not sure how to access training for my team/company',
  70: 'Other (please specify)',
  71: "I don't experience any challenges at the moment*",
  72: '*exclusive',
  77: 'Q9',
  78: 'Which devices are used by you and any of your employees?',
  81: nan,
  83: 'Smartphone',
  84: 'Laptop',
  85: 'Desktop PC',
  86: 'iPad or tablet',
  87: 'Other (please specify)',
  92: 'Q10',
  93: 'What software/applications are used by your business?',
  96: nan,
  98: 'Marketing platforms such as email and social media channels',
  99: 'Business productivity applications such as Microsoft Office or Google Workspace',
  100: 'Financial tools to manage your cash flow, budgets and payments',
  101: 'Online security software to protect against online threats',
  102: 'Marketing tools to schedule and automate your posts',
  103: 'Website building tools such as Wix, Webflow or WordPress',
  104: 'Inventory management tools to keep a track of your stock',
  105: 'HR tech for recruiting employees and managing HR systems',
  106: 'Other (please specify)',
  107: 'None of the above.*',
  108: '*exclusive',
  113: 'Q11',
  114: 'What type of internet package does your business have?',
  116: nan,
  118: 'Fiber Optic',
  119: 'Wireless broadband',
  120: 'Satellite broadband',
  121: 'Other (please specify)',
  126: 'Q12',
  127: 'Do you need to be able to view content offline? This could be due to frequent WiFi or data issues.',
  129: nan,
  131: 'Yes',
  132: 'No',
  133: "I don't know"},
 'Unnamed: 2': {6: nan,
  7: nan,
  9: 'Ireland',
  11: '6%',
  12: '30%',
  13: '31%',
  14: '33%',
  19: nan,
  20: nan,
  22: 'Ireland',
  24: '37%',
  25: '36%',
  26: '27%',
  31: nan,
  32: nan,
  34: 'Ireland',
  36: '7%',
  37: '15%',
  38: '14%',
  39: '5%',
  40: '13%',
  41: '3%',
  42: '11%',
  43: '1%',
  44: '9%',
  45: '13%',
  46: '9%',
  51: nan,
  52: nan,
  55: 'Ireland',
  57: '43%',
  58: '27%',
  59: '29%',
  60: '33%',
  61: '13%',
  62: '15%',
  63: '23%',
  64: '11%',
  65: '8%',
  66: '9%',
  67: '5%',
  68: '11%',
  69: '6%',
  70: '4%',
  71: '11%',
  72: nan,
  77: nan,
  78: nan,
  81: 'Ireland',
  83: '83%',
  84: '69%',
  85: '39%',
  86: '38%',
  87: '1%',
  92: nan,
  93: nan,
  96: 'Ireland',
  98: '43%',
  99: '48%',
  100: '35%',
  101: '37%',
  102: '29%',
  103: '35%',
  104: '17%',
  105: '9%',
  106: '1%',
  107: '10%',
  108: nan,
  113: nan,
  114: nan,
  116: 'Ireland',
  118: '27%',
  119: '63%',
  120: '5%',
  121: '5%',
  126: nan,
  127: nan,
  129: 'Ireland',
  131: '51%',
  132: '43%',
  133: '6%'},
 'Unnamed: 3': {6: nan,
  7: nan,
  9: 'Poland',
  11: '12%',
  12: '29%',
  13: '35%',
  14: '25%',
  19: nan,
  20: nan,
  22: 'Poland',
  24: '58%',
  25: '31%',
  26: '12%',
  31: nan,
  32: nan,
  34: 'Poland',
  36: '5%',
  37: '21%',
  38: '8%',
  39: '5%',
  40: '10%',
  41: '7%',
  42: '10%',
  43: '2%',
  44: '6%',
  45: '21%',
  46: '8%',
  51: nan,
  52: nan,
  55: 'Poland',
  57: '42%',
  58: '30%',
  59: '26%',
  60: '20%',
  61: '26%',
  62: '20%',
  63: '18%',
  64: '10%',
  65: '6%',
  66: '5%',
  67: '7%',
  68: '6%',
  69: '5%',
  70: '2%',
  71: '17%',
  72: nan,
  77: nan,
  78: nan,
  81: 'Poland',
  83: '90%',
  84: '78%',
  85: '48%',
  86: '23%',
  87: '-',
  92: nan,
  93: nan,
  96: 'Poland',
  98: '49%',
  99: '66%',
  100: '31%',
  101: '33%',
  102: '20%',
  103: '16%',
  104: '17%',
  105: '6%',
  106: '1%',
  107: '11%',
  108: nan,
  113: nan,
  114: nan,
  116: 'Poland',
  118: '49%',
  119: '41%',
  120: '7%',
  121: '3%',
  126: nan,
  127: nan,
  129: 'Poland',
  131: '34%',
  132: '53%',
  133: '13%'},
 'Unnamed: 4': {6: nan,
  7: nan,
  9: 'Spain',
  11: '9%',
  12: '25%',
  13: '30%',
  14: '37%',
  19: nan,
  20: nan,
  22: 'Spain',
  24: '42%',
  25: '40%',
  26: '19%',
  31: nan,
  32: nan,
  34: 'Spain',
  36: '16%',
  37: '15%',
  38: '13%',
  39: '4%',
  40: '10%',
  41: '3%',
  42: '7%',
  43: '2%',
  44: '7%',
  45: '18%',
  46: '8%',
  51: nan,
  52: nan,
  55: 'Spain',
  57: '37%',
  58: '26%',
  59: '31%',
  60: '26%',
  61: '20%',
  62: '21%',
  63: '11%',
  64: '12%',
  65: '12%',
  66: '8%',
  67: '10%',
  68: '6%',
  69: '6%',
  70: '4%',
  71: '11%',
  72: nan,
  77: nan,
  78: nan,
  81: 'Spain',
  83: '83%',
  84: '60%',
  85: '61%',
  86: '36%',
  87: '1%',
  92: nan,
  93: nan,
  96: 'Spain',
  98: '54%',
  99: '47%',
  100: '37%',
  101: '39%',
  102: '26%',
  103: '22%',
  104: '30%',
  105: '10%',
  106: '2%',
  107: '10%',
  108: nan,
  113: nan,
  114: nan,
  116: 'Spain',
  118: '76%',
  119: '18%',
  120: '4%',
  121: '2%',
  126: nan,
  127: nan,
  129: 'Spain',
  131: '42%',
  132: '47%',
  133: '12%'},
 'Unnamed: 5': {6: nan,
  7: nan,
  9: 'France',
  11: '16%',
  12: '36%',
  13: '29%',
  14: '21%',
  19: nan,
  20: nan,
  22: 'France',
  24: '64%',
  25: '27%',
  26: '10%',
  31: nan,
  32: nan,
  34: 'France',
  36: '6%',
  37: '18%',
  38: '12%',
  39: '5%',
  40: '22%',
  41: '4%',
  42: '6%',
  43: '2%',
  44: '3%',
  45: '13%',
  46: '11%',
  51: nan,
  52: nan,
  55: 'France',
  57: '44%',
  58: '25%',
  59: '26%',
  60: '23%',
  61: '21%',
  62: '14%',
  63: '11%',
  64: '18%',
  65: '11%',
  66: '5%',
  67: '5%',
  68: '8%',
  69: '4%',
  70: '5%',
  71: '13%',
  72: nan,
  77: nan,
  78: nan,
  81: 'France',
  83: '85%',
  84: '72%',
  85: '43%',
  86: '32%',
  87: '-',
  92: nan,
  93: nan,
  96: 'France',
  98: '37%',
  99: '36%',
  100: '35%',
  101: '31%',
  102: '28%',
  103: '36%',
  104: '19%',
  105: '8%',
  106: '2%',
  107: '14%',
  108: nan,
  113: nan,
  114: nan,
  116: 'France',
  118: '61%',
  119: '33%',
  120: '6%',
  121: '2%',
  126: nan,
  127: nan,
  129: 'France',
  131: '42%',
  132: '50%',
  133: '8%'},
 'Unnamed: 6': {6: nan,
  7: nan,
  9: 'Chile',
  11: '19%',
  12: '50%',
  13: '27%',
  14: '5%',
  19: nan,
  20: nan,
  22: 'Chile',
  24: '34%',
  25: '52%',
  26: '15%',
  31: nan,
  32: nan,
  34: 'Chile',
  36: '9%',
  37: '26%',
  38: '10%',
  39: '11%',
  40: '9%',
  41: '2%',
  42: '11%',
  43: '4%',
  44: '5%',
  45: '5%',
  46: '10%',
  51: nan,
  52: nan,
  55: 'Chile',
  57: '38%',
  58: '36%',
  59: '23%',
  60: '25%',
  61: '15%',
  62: '23%',
  63: '18%',
  64: '14%',
  65: '15%',
  66: '13%',
  67: '10%',
  68: '6%',
  69: '11%',
  70: '3%',
  71: '6%',
  72: nan,
  77: nan,
  78: nan,
  81: 'Chile',
  83: '85%',
  84: '65%',
  85: '35%',
  86: '21%',
  87: '-',
  92: nan,
  93: nan,
  96: 'Chile',
  98: '56%',
  99: '39%',
  100: '22%',
  101: '15%',
  102: '30%',
  103: '18%',
  104: '23%',
  105: '12%',
  106: '2%',
  107: '10%',
  108: nan,
  113: nan,
  114: nan,
  116: 'Chile',
  118: '47%',
  119: '39%',
  120: '9%',
  121: '6%',
  126: nan,
  127: nan,
  129: 'Chile',
  131: '53%',
  132: '36%',
  133: '12%'}}

that I want to convert to row wise in the format

Question No     Question     Answers    Country   Response 
Q5               Text q5?    Option 1    Ireland.  9%  
Q5               Text q5?    Option 1    Poland    56%    
Q5               Text q5?    Option 1    Spain     78%
Q5               Text q5?    Option 1    France     23%
Q5               Text q5?    Option 1    Chile      22%
Q5               Text q5?    Option 2    Ireland    19%
Q5               Text q5?    Option 2    Poland     44%
Q5               Text q5?    Option 2    Spain      65%
Q5               Text q5?    Option 2    France     33%
Q5               Text q5?    Option 2    Chile      44%
Q5               Text q5?    Option 3    Ireland    78%
Q5               Text q5?    Option 3    Poland     88%
Q5               Text q5?    Option 3    Spain      66%
Q5               Text q5?    Option 3    France     54%
Q5               Text q5?    Option 3    Chile      97%
Q5               Text q5?    Option 4    Ireland    43%  
Q5               Text q5?    Option 4    Poland     32%
Q5               Text q5?    Option 4    Spain      67%
Q5               Text q5?    Option 4    France     23%
Q5               Text q5?    Option 4    Chile      21%
Q6               Text q6?    Option 1    Ireland    39%  
Q6               Text q6?    Option 1    Poland     16%    
Q6               Text q6?    Option 1    Spain      38%
Q6               Text q6?    Option 1    France     13%
Q6               Text q6?    Option 1    Chile      22%
Q6               Text q6?    Option 2    Ireland    38%
Q6               Text q6?    Option 2    Poland     82% 
Q6               Text q6?    Option 2    Spain      64%
Q6               Text q6?    Option 2    France     54%
Q6               Text q6?    Option 2    Chile      97%
Q6               Text q6?    Option 3    Ireland    53% 
Q6               Text q6?    Option 3    Poland     12%
Q6               Text q6?    Option 3    Spain      97%
Q6               Text q6?    Option 3    France     13% 
Q6               Text q6?    Option 3    Chile      91%

I have tried un pivoting the Dataframe but since the country names are not in columns neither are they in multi indexed form I did not get the required format.

df_unpivot = pd.melt(surveys, id_vars='back', value_vars=['Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6'],
             var_name='Country', value_name='Response')   

I have created a new Dataframe to copy all the information in that

surveys_cleaned['Question_Number'] = surveys['back'].str.contains("^Q", na=False)

but not sure how to Unpivot the rest of the table. Any help or lead would be appreciated.

UPDATE: I have renamed the columns and applied pivot again so my data looks like this now

The code:

surveys = surveys.rename(columns={"Unnamed: 2": "Ireland", "Unnamed: 3": "Poland","Unnamed: 4": "Spain","Unnamed: 5": "France", "Unnamed: 6": "Chile"})

surveys = surveys[surveys['back'].notna()]

The main idea is that you should even turn a pivot table into a pseudo-plane

data is your dict.

data = {'back': {6: 'Q5',
  7: 'How long has the business been operating?',...

Read the data and rename columns (as in your example)

df = pd.DataFrame.from_dict(data).rename(columns={"Unnamed: 2": "Ireland", "Unnamed: 3": "Poland","Unnamed: 4": "Spain","Unnamed: 5": "France", "Unnamed: 6": "Chile"})[df['back'].notna()]

Create Q field for "Question No" from back field

df.loc[df['back'].str.contains("^Q"), 'Q'] = df.loc[df['back'].str.contains("^Q"), 'back']
df['Q'] = df['Q'].fillna(method='ffill')

Drop rows with "Question No" only

df = df[~df['back'].str.contains("^Q")]

Create field quest for "Question"" (as Question No field)

df.loc[df['Ireland'].isnull(), 'quest'] = df.loc[df['Ireland'].isnull(), 'back']
df['quest'] = df['quest'].fillna(method='ffill')
df = df[~df['Ireland'].isnull()]

Unpivot the dataframe and rename columns

unpivot_df = df.melt(id_vars=['Q', 'quest', 'back'])
unpivot_df.columns = ['Question No', 'Question', 'Answer', 'Country', 'Response']

Full code at https://gitlab.com/-/snippets/2376741

With best regards

# Fix column names.
df.columns = ['back', 'Ireland', 'Poland', 'Spain', 'France', 'Chile']
# Remove the asterisk lines.
df = df[~df.back.fillna('').str.startswith('*')]

# Fill question number across, drop remaining null lines.
df = df.ffill(axis=1).dropna()

# Make a new column for Question number and fill it down.
q_rows = df.back.str.match('Q\d*')
df['Question'] = df.back.where(q_rows, np.nan).ffill()

# Remove the question rows.
df = df[~q_rows]

dfs = []
# For each question group...
for question, group in df.groupby('Question'):
    # Reset the index of the group.
    group = group.reset_index(drop=True)
    # Make the columns 
    group.columns = pd.MultiIndex.from_arrays([group.columns, group.loc[0, :]])
    # Grab the question.
    text = group.iat[0,0]
    # Drop the question Row.
    group = group.drop(0)
    # Melt the group, keeping the question_no and text.
    group = group.melt([('Question', question), ('back', text)])
    # Update your column names.
    group.columns = ['Question No', 'Answers', 'Country', 'Question', 'Response']
    # Add the group to the list.
    dfs.append(group)

# Put all the groups back together.
cleaned = pd.concat(dfs, ignore_index=True)
# Re-order the columns how you want.
cleaned = cleaned[['Question No', 'Question', 'Answers', 'Country',  'Response']]
print(cleaned.sample(50))

Output of 50 random rows:

    Question No                                           Question                                            Answers  Country Response
176          Q8  What challenges do you currently experience in...                                    Lack of funding  Ireland      27%
21          Q10  What software/applications are used by your bu...  Business productivity applications such as Mic...    Spain      47%
10          Q10  What software/applications are used by your bu...  Marketing platforms such as email and social m...   Poland      49%
124          Q7          Which sector does your company belong to?        Personal - Healthcare, Beauty, Fitness, etc  Ireland      13%
187          Q8  What challenges do you currently experience in...  Not sure how to access training for my team/co...  Ireland       6%
211          Q8  What challenges do you currently experience in...              Difficulty managing business finances    Spain      11%
29          Q10  What software/applications are used by your bu...                                None of the above.*    Spain      10%
233          Q8  What challenges do you currently experience in...                             Other (please specify)   France       5%
48          Q10  What software/applications are used by your bu...                             Other (please specify)    Chile       2%
138          Q7          Which sector does your company belong to?         Motor Trade - Servicing, Tyres, Sales, etc   Poland       2%
166          Q7          Which sector does your company belong to?           Creative - Arts, Restoration, Media, etc    Chile      10%
114          Q6  How many employees are regularly involved in r...                                         1 employee   France      64%
244          Q8  What challenges do you currently experience in...               No access to digital skills training    Chile      13%
59          Q11  What type of internet package does your busine...                                 Wireless broadband    Spain      18%
49          Q10  What software/applications are used by your bu...                                None of the above.*    Chile      10%
221          Q8  What challenges do you currently experience in...                                    Lack of funding   France      25%
38          Q10  What software/applications are used by your bu...                             Other (please specify)   France       2%
90           Q5          How long has the business been operating?                                        1 - 3 years   Poland      29%
177          Q8  What challenges do you currently experience in...                   Standing out against competitors  Ireland      29%
151          Q7          Which sector does your company belong to?   Professional - Technology, Legal, Financial, etc    Spain      18%
22          Q10  What software/applications are used by your bu...  Financial tools to manage your cash flow, budg...    Spain      37%
32          Q10  What software/applications are used by your bu...  Financial tools to manage your cash flow, budg...   France      35%
11          Q10  What software/applications are used by your bu...  Business productivity applications such as Mic...   Poland      66%
63          Q11  What type of internet package does your busine...                                 Wireless broadband   France      33%
85           Q5          How long has the business been operating?                                   Less than 1 year  Ireland       6%
163          Q7          Which sector does your company belong to?                      Other Sector (please specify)   France      11%
174          Q7          Which sector does your company belong to?                      Other Sector (please specify)    Chile      10%
123          Q7          Which sector does your company belong to?  Production - Food, Drinks, Carpentry, Clothing...  Ireland       5%
112          Q6  How many employees are regularly involved in r...                                   2 to 4 employees    Spain      40%
141          Q7          Which sector does your company belong to?                      Other Sector (please specify)   Poland       8%
213          Q8  What challenges do you currently experience in...                                   Lack of premises    Spain      12%
94           Q5          How long has the business been operating?                                        1 - 3 years    Spain      25%
17          Q10  What software/applications are used by your bu...  HR tech for recruiting employees and managing ...   Poland       6%
1           Q10  What software/applications are used by your bu...  Business productivity applications such as Mic...  Ireland      48%
56          Q11  What type of internet package does your busine...                                Satellite broadband   Poland       7%
54          Q11  What type of internet package does your busine...                                        Fiber Optic   Poland      49%
4           Q10  What software/applications are used by your bu...  Marketing tools to schedule and automate your ...  Ireland      29%
189          Q8  What challenges do you currently experience in...   I don't experience any challenges at the moment*  Ireland      11%
265          Q9  Which devices are used by you and any of your ...                                         Smartphone   France      85%
108          Q6  How many employees are regularly involved in r...                                         1 employee   Poland      58%
91           Q5          How long has the business been operating?                                       4 - 10 years   Poland      35%
162          Q7          Which sector does your company belong to?   Professional - Technology, Legal, Financial, etc   France      13%
78          Q12  Do you need to be able to view content offline...                                       I don't know    Spain      12%
5           Q10  What software/applications are used by your bu...  Website building tools such as Wix, Webflow or...  Ireland      35%
255          Q9  Which devices are used by you and any of your ...                                         Smartphone   Poland      90%
185          Q8  What challenges do you currently experience in...                             Lack of cyber security  Ireland       5%
266          Q9  Which devices are used by you and any of your ...                                             Laptop   France      72%
127          Q7          Which sector does your company belong to?         Motor Trade - Servicing, Tyres, Sales, etc  Ireland       1%
98           Q5          How long has the business been operating?                                        1 - 3 years   France      36%
263          Q9  Which devices are used by you and any of your ...                                     iPad or tablet    Spain      36%

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM