将值从一个 dataframe 列复制到另一列

Question

I have two data frames SF and OF.我有两个数据框 SF 和 OF。

SF:顺丰：

   PartNumber  ParentPartNumber   Webname       Brand        Value_Size     Full Description               ImagePath                      Short Description     Weight       RetailPriceEUR           
     2.5            2                Sidi         Si              S     Honeycomb elastic          https://link1,https://link2           Honey               2.3             331           
     2.6            2                Sidi         Si              M     Honeycomb elastic          https://link1,https://link2            Honey              2.3             331           
     2.7            2                Sidi         Si              L     Honeycomb elastic          https://link1,https://link2           Honey               2.3             331          
     3.2            3                Shoei        Sho            S      E.Q.R.S.                   https://link3                          ERQS               1.5             331           
     3.3            3                Shoei        Sho            M      E.Q.R.S.                   https://link3                          ERQS               1.5             331           
     2.9            2                Sidi         Si              XL   Honeycomb elastic           https://link1,https://link2            Honey              2.3             331

OF:的：

   Type       SKU      Published         Name     Parent        Size     Full Full Description           Image                             ShortDescription       Weight (kg)           Regular Price            Isfeatured           height
 simple        4          1              Bec
  simple       8          1              Lin

What I want to do is to add an extra row before each duplicated row present in SF and append it to the OF data frame.我想要做的是在 SF 和 append 中存在的每个重复行之前添加一个额外的行到 OF 数据帧。 For example, if there are duplicates in a parent like 2,2,3,3 the first row of 2 and the second row of 2 needs to be copied, and in addition to all rows, there have to be an extra one added before them with info as in the description.例如，如果父项中有重复项，例如 2,2,3,3，则需要复制 2 的第一行和 2 的第二行，并且除了所有行之外，还必须在之前添加一个额外的行他们与描述中的信息。 So the end result should look like所以最终结果应该看起来像

Result (SF rows appended in OF):结果（附加在 OF 中的 SF 行）：

  Type         SKU      Published         Name      Parent           Size             Full Description           ImagePath                          ShortDescription       Weight           Regular Price            Isfeatured           height
       simple     4          1              Bec
       simple     8          1              Lin
     variable     2          1              Sidi                        S,M,L,XL          Honeycomb elastic         https://link1,https://link2                                                                            yes
    variation     2.5        0              Honey        2               S                                                                                 Honey                2.3              331                        yes
    variation     2.6        0              Honey        2               M                                                                                 Honey                2.3              331                        yes
    variation     2.7        0              Honey        2               L                                                                                 Honey                2.3              331                        yes
    variation     2.9        0              Honey        2               XL                                                                                Honey                2.3              331                        yes
    variable      3 (extra)  1              Sho                         E41,E42                E.Q.R.S.                 https://link3                                                                                       yes
    variation      3          0              ERQS         3               E41                                                                              EQRS                 1.5               33                        yes
    variation      3          0              ERQS         3               E42                                                                              ERQS                 1.5               33                        yes

Basically this is what is I want to do基本上这就是我想要做的

  if SF ParentPartNumber has duplicates (more than one) AND SF VisibleToCustomer == Y then
    
    Create new row in OF with values as follows  (PARENT PART)
    OF Type = variable
    OF SKU = SF ParentPartNumber
    OF Name = SF WebName
   OF Published = 0
    OF Is featured = yes
    OF Description = SF FullDescription

    OF Images = SF ImagePath (when there is more than 1 link in SF replace | with , (comma))
    OF Attribute 1 value(s) = all values from SF Size that belong to the same ParentPartNumber separated with comma
  
    then below that row copy all rows that belong to this ParentPartNumber as follows   ( all the part numbers under parent part number)
    
    OF Type = variation 
    OF SKU = SF PartNumber
    OF Published = 0
    OF Name = SF ShortDescription 
    OF Weight (kg) = SF Weight
    OF Regular price = SF RetailPriceEUR
    OF Parent = SF ParentPartNumber
     OF Is featured = yes

this is how I have tried changing the code这就是我尝试更改代码的方式

SFs = SF[SF.VisibleToConsumer == 'Y'] 
SFs = SFs[SFs['ParentPartNumber'].duplicated(keep = False)]

def get_group_by_data(df1, name, parent_cols, child_cols, final_col_names):
    df_dict = {col_name: [] for col_name in final_col_names}  # for the final dataframe
    col_names_map = {
        'Type' : 'Type','SKU': 'SKU','WebName': 'Name','Published': 'Published', 'Isfeatured': 'yes', 'ShortDescription': 'Name','FullDescription' :'Description', 
         'Weight': 'Weight (kg)', 'height' : 'height'
          'RetailPriceEUR': 'Regular price', 
          'ImagePath': 'Images','ParentPartNumber': 'Parent',
         'Size': 'Attribute 1 value(s)',
   
    }  # for mapping the output column names to input col names


    # extra row
    parent_comm_cols_n_elems = dict()
    df_dict['Type'].append('variable')
    df_dict['SKU'].append(str(name))
    df_dict['Published'].append(1)
    df_dict['Is featured?'].append('yes')
     df_dict['Parent'].append("")
    for col in parent_cols:
        parent_col_vals = list(dict.fromkeys(list(df1[col])).keys())  # using dictionary for ignoring the duplicate values and still retaining the order
        parent_comm_cols_n_elems[col] = len(parent_col_vals)
        df_dict[col_names_map[col]].append(",".join(val for val in parent_col_vals if val == val))  # val == val for ignoring nan values
    for col in child_cols:
        df_dict[col_names_map[col]].append("")                      
    
    # for adding all the part numbers under parent part number 
    for idx, row in df1.iterrows():
        df_dict['Type'].append('variation')
        df_dict['SKU'].append(row['PartNumber'])
       
        df_dict['Published'].append(1)
        df_dict['Is featured?'].append(0)
       
        df_dict['Parent'].append(str(name))
       
        for col in parent_cols:
            # in case of S,M,L,XL chile rows would have size populated,
            # but in case of 1 elem, like Honeycomb elastic, size not populated in child rows
            if parent_comm_cols_n_elems[col] > 1:
                df_dict[col_names_map[col]].append(row[col])
            else:
                df_dict[col_names_map[col]].append("")
        for col in  child_cols:
            df_dict[col_names_map[col]].append(row[col])
    return pd.DataFrame.from_dict(df_dict)

parent_cols = ['Size', 'FullDescription', 'ImagePath']
#common_cols = ['WebName']
child_cols = ['ShortDescription', 'Weight', 'RetailPriceEUR']

df_append_cols = ['Type', 'SKU', 'Name',  'Published', 'Is featured?',
       'Short description', 'Full Description',
       'Weight (kg)',
        'Height ',', 'Regular price', 'Images',
       'Parent', 'Attribute 1 value(s)']


df_append = pd.DataFrame(SF.groupby('ParentPartNumber')[['PartNumber'] + parent_cols  + child_cols]
                         .apply(lambda x: get_group_by_data(x, x.name, parent_cols,
                                                            child_cols, df_append_cols)).values,  # x.name is 'ParentPartNumber'
                         columns=df_append_cols)
df_append = df_append.fillna('')

Answer 1

def get_group_by_data(df1, name, parent_cols, child_cols, final_col_names):
    df_dict = {col_name: [] for col_name in final_col_names}  # for the final dataframe
    col_names_map = {
        'Type' : 'Type','SKU': 'SKU','WebName': 'Name','Published': 'Published',
        'Isfeatured': 'yes', 'Short Description': 'Name','Full Description' :'Full Description', 
        'Weight': 'Weight (kg)', 'height' : 'height',
        'RetailPriceEUR': 'Regular price', 
        'ImagePath': 'Images','ParentPartNumber': 'Parent',
        'Value_Size': 'Attribute 1 value(s)',
    }  # for mapping the output column names to input col names

    # extra row
#     print(df_dict)
    parent_comm_cols_n_elems = dict()
    df_dict['Type'].append('variable')
    df_dict['SKU'].append(str(name))
    df_dict['Published'].append(1)
    df_dict['Is featured?'].append('yes')
    df_dict['Parent'].append("")
    df_dict['Height'].append("") # added this 
    df_dict['Short Description'].append("") # added this
#     print(f"Parent cols: {parent_cols}")
    for col in parent_cols:
        parent_col_vals = list(dict.fromkeys(list(df1[col])).keys())  # using dictionary for ignoring the duplicate values and still retaining the order
        parent_comm_cols_n_elems[col] = len(parent_col_vals)
#         print(f"parent_cols: {parent_col_vals}")
        df_dict[col_names_map[col]].append(",".join(val for val in parent_col_vals if val == val))  # val == val for ignoring nan values
    for col in child_cols:
        df_dict[col_names_map[col]].append("")                      
    
    # for adding all the part numbers under parent part number 
    for idx, row in df1.iterrows():
        df_dict['Type'].append('variation')
        df_dict['SKU'].append(row['PartNumber'])
        df_dict['Short Description'].append(row['Short Description']) # added this
        df_dict['Published'].append(1)
        df_dict['Is featured?'].append(0)
        df_dict['Height'].append("") # added this
        df_dict['Parent'].append(str(name))
       
        for col in parent_cols:
            # in case of S,M,L,XL chile rows would have size populated,
            # but in case of 1 elem, like Honeycomb elastic, size not populated in child rows
            if parent_comm_cols_n_elems[col] > 1:
                df_dict[col_names_map[col]].append(row[col])
            else:
                df_dict[col_names_map[col]].append("")
        for col in  child_cols:
            df_dict[col_names_map[col]].append(row[col])
#     print(df_dict)
    return pd.DataFrame.from_dict(df_dict)

Grouping by 'ParentPartNumber' and applying group operation on all the columns in the various lists below.按“ParentPartNumber”分组并对下面各种列表中的所有列应用分组操作。 The basic idea is when you apply group by, all the rows belonging to the group will be passed as a dataframe(only the columns on which you are applying the operations).基本思想是，当您应用分组依据时，属于该组的所有行都将作为数据框传递（只有您正在应用操作的列）。

SF = SF[SF['ParentPartNumber'].duplicated(keep = False)]

parent_cols = ['Value_Size', 'Full Description', 'ImagePath']
#common_cols = ['WebName']
child_cols = ['Short Description', 'Weight', 'RetailPriceEUR']

df_append_cols = ['Type', 'SKU', 'Name',  'Published', 'Is featured?',
       'Short Description', 'Full Description', 'Weight (kg)',
       'Height', 'Regular price', 'Images','Parent', 'Attribute 1 value(s)']
df_append = pd.DataFrame(SF.groupby('ParentPartNumber')[['PartNumber'] + parent_cols  + child_cols]
                         .apply(lambda x: get_group_by_data(x, x.name, parent_cols,
                                                            child_cols, df_append_cols)).values,  # x.name is 'ParentPartNumber'
                         columns=df_append_cols)
df_append = df_append.fillna('')
df_append

将值从一个 dataframe 列复制到另一列

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-01 20:50:12

将值从一个 dataframe 列复制到另一列

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-01 20:50:12

解决方案1
1 已采纳 2021-01-01 20:50:12