简体   繁体   中英

How to assign pandas dataframe to slice of other dataframe

I have Excel spreadsheets with data, one for each year. Alas the columns change slightly over the year. What I want is to have one dataframe with all the data and fill the lacking columns with predefined data. I wrote a small example program to test that.

import numpy as np
import pandas as pd

# Initialize three dataframes
df1 = pd.DataFrame([[1,2], [11,22],[111,222]], columns=['een', 'twee'])
df2 = pd.DataFrame([[3,4], [33,44],[333,444]], columns=['een', 'drie'])
df3 = pd.DataFrame([[5,6], [55,66],[555,666]], columns=['twee', 'vier'])

# Store these in a dictionary and print for verification
d = {'df1': df1, 'df2': df2, 'df3': df3}

for key in d:
    print(d[key])

print()

# Create a list of all columns, as order is relevant a Set is not used
cols = []

# Count total number of rows
nrows = 0

# Loop thru each dataframe to determine total number of rows and columns
for key in d:
    df = d[key]
    nrows += len(df)

    for col in df.columns:
        if col not in cols:
            cols += [col]

# Create total dataframe, fill with default (zeros)
data = pd.DataFrame(np.zeros((nrows, len(cols))), columns=cols)

# Assign dataframe to each slice
c = 0
for key in d:
    data.loc[c:c+len(d[key])-1, d[key].columns] = d[key]
    c += len(d[key])

print(data)

The dataframes are initialized all right but there is something weird with the assignment to the slice of the data dataframe. What I wanted (and expected) is:

     een   twee  drie  vier
0    1.0    2.0   0.0   0.0
1   11.0   22.0   0.0   0.0
2  111.0  222.0   0.0   0.0
3    3.0    0.0   4.0   0.0
4   33.0    0.0  44.0   0.0
5  333.0    0.0 444.0   0.0
6    0.0    5.0   0.0   6.0
7    0.0   55.0   0.0  66.0
8    0.0  555.0   0.0 666.0

But this is what I got:

     een   twee  drie  vier
0    1.0    2.0   0.0   0.0
1   11.0   22.0   0.0   0.0
2  111.0  222.0   0.0   0.0
3    NaN    0.0   NaN   0.0
4    NaN    0.0   NaN   0.0
5    NaN    0.0   NaN   0.0
6    0.0    NaN   0.0   NaN
7    0.0    NaN   0.0   NaN
8    0.0    NaN   0.0   NaN

The location AND the data of the first dataframe are correctly assigned. However, the second dataframe is assigned to the correct location, but not its contents: NaN is assigned instead. This also happens for the third dataframe: correct location but missing data. I have tried to assign d[key].loc[0:2, d[key].columns and some more fanciful solutions to the data slice, but all return NaN. How can I get the contents of the dataframe as well assigned to data?

Per the comments, you can use:

pd.concat([df1, df2, df3])

OR

pd.concat([df1, df2, df3]).fillna(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM