简体   繁体   中英

How to concatenate a series to a pandas dataframe in python?

I would like to iterate through a dataframe rows and concatenate that row to a different dataframe basically building up a different dataframe with some rows.

For example: ` IPCSection and IPCClass Dataframes


allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns), axis = 0)
finalpatentclasses = pd.DataFrame(columns=allcolumns)
for isec, secrow in IPCSection.iterrows():
    for icl, clrow in IPCClass.iterrows():
        if (secrow[0] in clrow[0]):
            pdList = [finalpatentclasses, pd.DataFrame(secrow), pd.DataFrame(clrow)]
            finalpatentclasses = pd.concat(pdList, axis=0, ignore_index=True)
display(finalpatentclasses)

The output is:

I want the nan values to dissapear and move all the data under the correct columns. I tried axis = 1 but messes up the column names. Append does not work as well all values are placed diagonally at the table with nan values as well.

The problem with the current implementation is that pd.concat is being called with axis=0 and ignore_index=True , resulting in the values from secrow and clrow being concatenated vertically and the original indices being ignored. This causes the values to be misaligned with the columns of the final dataframe, as shown in the output.

To solve this problem, you can create a new dataframe that has the same columns as the final dataframe, and then assign the values from secrow and clrow to the appropriate columns in the new dataframe. After that, you can append the new dataframe to the final dataframe using the pd.concat function with axis=0 , as before.

Here is a modified version of the code that should produce the desired output:

allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns), axis=0)
finalpatentclasses = pd.DataFrame(columns=allcolumns)
for isec, secrow in IPCSection.iterrows():
    for icl, clrow in IPCClass.iterrows():
        if (secrow[0] in clrow[0]):
            # Create a new dataframe with the same columns as the final dataframe
            newrow = pd.DataFrame(columns=allcolumns)
            # Assign the values from secrow and clrow to the appropriate columns in the new dataframe
            newrow[IPCSection.columns] = secrow.values
            newrow[IPCClass.columns] = clrow.values
            # Append the new dataframe to the final dataframe
            finalpatentclasses = pd.concat([finalpatentclasses, newrow], axis=0)
display(finalpatentclasses)

This should result in a final dataframe that has the values from secrow and clrow concatenated horizontally under the correct columns, with no nan values.

UPDATED SCRIPT:

    allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns), axis=0)
finalpatentclasses = pd.DataFrame(columns=allcolumns)
for isec, secrow in IPCSection.iterrows():
    for icl, clrow in IPCClass.iterrows():
        if (secrow[0] in clrow[0]):
            print("Condition met")
            pdList = [finalpatentclasses, secrow.to_frame().transpose(), clrow.to_frame().transpose()]
            finalpatentclasses = pd.concat(pdList, axis=0, ignore_index=True)
display(finalpatentclasses)

Final Update (Efficient for larger datasets):

allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns), axis=0)
finalpatentclasses_list = []
for secrow in IPCSection.itertuples(index=False):
    for clrow in IPCClass.itertuples(index=False):
        if secrow[0] in clrow[0]:
            row = list(secrow) + list(clrow)
            finalpatentclasses_list.append(row)
finalpatentclasses = pd.DataFrame(finalpatentclasses_list, columns=allcolumns)
display(finalpatentclasses)

Note how secrow and clrow are now namedtuples instead of Series, and need to be converted to lists using the list() function before concatenating them with the + operator. Also, the index=False argument is passed to itertuples() to skip the index column in the output.

Alright, I have figured it out. The idea is that you create a newrowDataframe and concatenate all the data in a list from there you can add it to the dataframe and then conc with the final dataframe.

Here is the code:

allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns), axis = 0)
finalpatentclasses = pd.DataFrame(columns=allcolumns)
for isec, secrow in IPCSection.iterrows():
    for icl, clrow in IPCClass.iterrows():
        newrow = pd.DataFrame(columns=allcolumns)
                values = np.concatenate((secrow.values, subclrow.values), axis=0)
                newrow.loc[len(newrow.index)] = values 
                finalpatentclasses = pd.concat([finalpatentclasses, newrow], axis=0)
finalpatentclasses.reset_index(drop=false, inplace=True)
display(finalpatentclasses)

Update the code below is more efficient:

allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns, IPCSubClass.columns, IPCGroup.columns), axis = 0)
newList = []
for secrow in IPCSection.itertuples():
    for clrow in IPCClass.itertuples():
        if (secrow[1] in clrow[1]):
            values = ([secrow[1], secrow[2], subclrow[1], subclrow[2]])
            new_row = {IPCSection.columns[0]: [secrow[1]], IPCSection.columns[1]: [secrow[2]],
                       IPCClass.columns[0]: [clrow[1]], IPCClass.columns[1]: [clrow[2]]}
            newList.append(values)
finalpatentclasses = pd.DataFrame(newList, columns=allcolumns)
display(finalpatentclasses)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM