简体   繁体   English

将值从一个 dataframe 列复制到另一列

[英]Copy values from one dataframe column to another

I have two data frames SF and OF.我有两个数据框 SF 和 OF。

SF:顺丰:

   PartNumber  ParentPartNumber   Webname       Brand        Value_Size     Full Description               ImagePath                      Short Description     Weight       RetailPriceEUR           
     2.5            2                Sidi         Si              S     Honeycomb elastic          https://link1,https://link2           Honey               2.3             331           
     2.6            2                Sidi         Si              M     Honeycomb elastic          https://link1,https://link2            Honey              2.3             331           
     2.7            2                Sidi         Si              L     Honeycomb elastic          https://link1,https://link2           Honey               2.3             331          
     3.2            3                Shoei        Sho            S      E.Q.R.S.                   https://link3                          ERQS               1.5             331           
     3.3            3                Shoei        Sho            M      E.Q.R.S.                   https://link3                          ERQS               1.5             331           
     2.9            2                Sidi         Si              XL   Honeycomb elastic           https://link1,https://link2            Honey              2.3             331           

OF:的:

   Type       SKU      Published         Name     Parent        Size     Full Full Description           Image                             ShortDescription       Weight (kg)           Regular Price            Isfeatured           height
 simple        4          1              Bec
  simple       8          1              Lin

What I want to do is to add an extra row before each duplicated row present in SF and append it to the OF data frame.我想要做的是在 SF 和 append 中存在的每个重复行之前添加一个额外的行到 OF 数据帧。 For example, if there are duplicates in a parent like 2,2,3,3 the first row of 2 and the second row of 2 needs to be copied, and in addition to all rows, there have to be an extra one added before them with info as in the description.例如,如果父项中有重复项,例如 2,2,3,3,则需要复制 2 的第一行和 2 的第二行,并且除了所有行之外,还必须在之前添加一个额外的行他们与描述中的信息。 So the end result should look like所以最终结果应该看起来像

Result (SF rows appended in OF):结果(附加在 OF 中的 SF 行):

  Type         SKU      Published         Name      Parent           Size             Full Description           ImagePath                          ShortDescription       Weight           Regular Price            Isfeatured           height
       simple     4          1              Bec
       simple     8          1              Lin
     variable     2          1              Sidi                        S,M,L,XL          Honeycomb elastic         https://link1,https://link2                                                                            yes
    variation     2.5        0              Honey        2               S                                                                                 Honey                2.3              331                        yes
    variation     2.6        0              Honey        2               M                                                                                 Honey                2.3              331                        yes
    variation     2.7        0              Honey        2               L                                                                                 Honey                2.3              331                        yes
    variation     2.9        0              Honey        2               XL                                                                                Honey                2.3              331                        yes
    variable      3 (extra)  1              Sho                         E41,E42                E.Q.R.S.                 https://link3                                                                                       yes
    variation      3          0              ERQS         3               E41                                                                              EQRS                 1.5               33                        yes
    variation      3          0              ERQS         3               E42                                                                              ERQS                 1.5               33                        yes
                 

Basically this is what is I want to do基本上这就是我想要做的

  if SF ParentPartNumber has duplicates (more than one) AND SF VisibleToCustomer == Y then
    
    Create new row in OF with values as follows  (PARENT PART)
    OF Type = variable
    OF SKU = SF ParentPartNumber
    OF Name = SF WebName
   OF Published = 0
    OF Is featured = yes
    OF Description = SF FullDescription

    OF Images = SF ImagePath (when there is more than 1 link in SF replace | with , (comma))
    OF Attribute 1 value(s) = all values from SF Size that belong to the same ParentPartNumber separated with comma
  
    then below that row copy all rows that belong to this ParentPartNumber as follows   ( all the part numbers under parent part number)
    
    OF Type = variation 
    OF SKU = SF PartNumber
    OF Published = 0
    OF Name = SF ShortDescription 
    OF Weight (kg) = SF Weight
    OF Regular price = SF RetailPriceEUR
    OF Parent = SF ParentPartNumber
     OF Is featured = yes

this is how I have tried changing the code这就是我尝试更改代码的方式

SFs = SF[SF.VisibleToConsumer == 'Y'] 
SFs = SFs[SFs['ParentPartNumber'].duplicated(keep = False)]

def get_group_by_data(df1, name, parent_cols, child_cols, final_col_names):
    df_dict = {col_name: [] for col_name in final_col_names}  # for the final dataframe
    col_names_map = {
        'Type' : 'Type','SKU': 'SKU','WebName': 'Name','Published': 'Published', 'Isfeatured': 'yes', 'ShortDescription': 'Name','FullDescription' :'Description', 
         'Weight': 'Weight (kg)', 'height' : 'height'
          'RetailPriceEUR': 'Regular price', 
          'ImagePath': 'Images','ParentPartNumber': 'Parent',
         'Size': 'Attribute 1 value(s)',
   
    }  # for mapping the output column names to input col names


    # extra row
    parent_comm_cols_n_elems = dict()
    df_dict['Type'].append('variable')
    df_dict['SKU'].append(str(name))
    df_dict['Published'].append(1)
    df_dict['Is featured?'].append('yes')
     df_dict['Parent'].append("")
    for col in parent_cols:
        parent_col_vals = list(dict.fromkeys(list(df1[col])).keys())  # using dictionary for ignoring the duplicate values and still retaining the order
        parent_comm_cols_n_elems[col] = len(parent_col_vals)
        df_dict[col_names_map[col]].append(",".join(val for val in parent_col_vals if val == val))  # val == val for ignoring nan values
    for col in child_cols:
        df_dict[col_names_map[col]].append("")                      
    
    # for adding all the part numbers under parent part number 
    for idx, row in df1.iterrows():
        df_dict['Type'].append('variation')
        df_dict['SKU'].append(row['PartNumber'])
       
        df_dict['Published'].append(1)
        df_dict['Is featured?'].append(0)
       
        df_dict['Parent'].append(str(name))
       
        for col in parent_cols:
            # in case of S,M,L,XL chile rows would have size populated,
            # but in case of 1 elem, like Honeycomb elastic, size not populated in child rows
            if parent_comm_cols_n_elems[col] > 1:
                df_dict[col_names_map[col]].append(row[col])
            else:
                df_dict[col_names_map[col]].append("")
        for col in  child_cols:
            df_dict[col_names_map[col]].append(row[col])
    return pd.DataFrame.from_dict(df_dict)

parent_cols = ['Size', 'FullDescription', 'ImagePath']
#common_cols = ['WebName']
child_cols = ['ShortDescription', 'Weight', 'RetailPriceEUR']

df_append_cols = ['Type', 'SKU', 'Name',  'Published', 'Is featured?',
       'Short description', 'Full Description',
       'Weight (kg)',
        'Height ',', 'Regular price', 'Images',
       'Parent', 'Attribute 1 value(s)']


df_append = pd.DataFrame(SF.groupby('ParentPartNumber')[['PartNumber'] + parent_cols  + child_cols]
                         .apply(lambda x: get_group_by_data(x, x.name, parent_cols,
                                                            child_cols, df_append_cols)).values,  # x.name is 'ParentPartNumber'
                         columns=df_append_cols)
df_append = df_append.fillna('')
def get_group_by_data(df1, name, parent_cols, child_cols, final_col_names):
    df_dict = {col_name: [] for col_name in final_col_names}  # for the final dataframe
    col_names_map = {
        'Type' : 'Type','SKU': 'SKU','WebName': 'Name','Published': 'Published',
        'Isfeatured': 'yes', 'Short Description': 'Name','Full Description' :'Full Description', 
        'Weight': 'Weight (kg)', 'height' : 'height',
        'RetailPriceEUR': 'Regular price', 
        'ImagePath': 'Images','ParentPartNumber': 'Parent',
        'Value_Size': 'Attribute 1 value(s)',
    }  # for mapping the output column names to input col names

    # extra row
#     print(df_dict)
    parent_comm_cols_n_elems = dict()
    df_dict['Type'].append('variable')
    df_dict['SKU'].append(str(name))
    df_dict['Published'].append(1)
    df_dict['Is featured?'].append('yes')
    df_dict['Parent'].append("")
    df_dict['Height'].append("") # added this 
    df_dict['Short Description'].append("") # added this
#     print(f"Parent cols: {parent_cols}")
    for col in parent_cols:
        parent_col_vals = list(dict.fromkeys(list(df1[col])).keys())  # using dictionary for ignoring the duplicate values and still retaining the order
        parent_comm_cols_n_elems[col] = len(parent_col_vals)
#         print(f"parent_cols: {parent_col_vals}")
        df_dict[col_names_map[col]].append(",".join(val for val in parent_col_vals if val == val))  # val == val for ignoring nan values
    for col in child_cols:
        df_dict[col_names_map[col]].append("")                      
    
    # for adding all the part numbers under parent part number 
    for idx, row in df1.iterrows():
        df_dict['Type'].append('variation')
        df_dict['SKU'].append(row['PartNumber'])
        df_dict['Short Description'].append(row['Short Description']) # added this
        df_dict['Published'].append(1)
        df_dict['Is featured?'].append(0)
        df_dict['Height'].append("") # added this
        df_dict['Parent'].append(str(name))
       
        for col in parent_cols:
            # in case of S,M,L,XL chile rows would have size populated,
            # but in case of 1 elem, like Honeycomb elastic, size not populated in child rows
            if parent_comm_cols_n_elems[col] > 1:
                df_dict[col_names_map[col]].append(row[col])
            else:
                df_dict[col_names_map[col]].append("")
        for col in  child_cols:
            df_dict[col_names_map[col]].append(row[col])
#     print(df_dict)
    return pd.DataFrame.from_dict(df_dict)

Grouping by 'ParentPartNumber' and applying group operation on all the columns in the various lists below.按“ParentPartNumber”分组并对下面各种列表中的所有列应用分组操作。 The basic idea is when you apply group by, all the rows belonging to the group will be passed as a dataframe(only the columns on which you are applying the operations).基本思想是,当您应用分组依据时,属于该组的所有行都将作为数据框传递(只有您正在应用操作的列)。

SF = SF[SF['ParentPartNumber'].duplicated(keep = False)]

parent_cols = ['Value_Size', 'Full Description', 'ImagePath']
#common_cols = ['WebName']
child_cols = ['Short Description', 'Weight', 'RetailPriceEUR']

df_append_cols = ['Type', 'SKU', 'Name',  'Published', 'Is featured?',
       'Short Description', 'Full Description', 'Weight (kg)',
       'Height', 'Regular price', 'Images','Parent', 'Attribute 1 value(s)']
df_append = pd.DataFrame(SF.groupby('ParentPartNumber')[['PartNumber'] + parent_cols  + child_cols]
                         .apply(lambda x: get_group_by_data(x, x.name, parent_cols,
                                                            child_cols, df_append_cols)).values,  # x.name is 'ParentPartNumber'
                         columns=df_append_cols)
df_append = df_append.fillna('')
df_append

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 Pandas 中的列值将内容从一个 Dataframe 复制到另一个 Dataframe - Copy contents from one Dataframe to another based on column values in Pandas 将nan值从一个数据框复制到另一个数据框 - Copy nan values from one dataframe to another 如果索引值相同,如何将一个DataFrame列复制到另一个Dataframe中 - How to copy one DataFrame column in to another Dataframe if their indexes values are the same 创建一个新的 pandas dataframe 列,其中包含一个附加值,然后是其下方另一列的值的副本 - Creating a new pandas dataframe column with one added value, then a copy of values from another column underneath it 如何根据值之间的差异将值从一个 dataframe 列复制到另一列 - How can I copy values from one dataframe column to another based on the difference between the values 将条件值从一列复制到另一列 - Copy conditional values from one column to another 有条件地将一个 DataFrame 列中的值替换为另一列中的值 - Conditionally replacing values in one DataFrame column with values from another column 如果值相等,则将列值从 dataframe 复制到另一个 - Copy column value from a dataframe to another if values are equal Pandas 将列名从一个数据帧复制到另一个 - Pandas copy column names from one dataframe to another 根据多列索引将值从一个 dataframe 复制到另一个 - Copy value from one dataframe to another based on multiple column index
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM