简体   繁体   中英

Replace every nth row in df1 with every row from df 2

(Absolute beginner here)

The following code should replace every 9th row of the template df with EVERY row of the data df. However it replaces every 9th row of template with every 9th row of data.

template.iloc[::9, 2] = data['Question (en)']
template.iloc[::9, 3] = data['Correct Answer']
template.iloc[::9, 4] = data['Incorrect Answer 1']
template.iloc[::9, 5] = data['Incorrect Answer 2']

Thank you for your help

The source of the problem with your code is that the initial step to any operation on 2 DataFrames is their alignment by indices.

To avoid this step, take the underlying Numpy array from one of arrays, invoking values . Since Numpy array has no index, Pandas can't perform the mentioned alignment.

Another correction is:

  • to take from the second DataFrame only as many rows as it is needed, and only these columns that are to be saved in the target array,
  • perform the whole update "in one go" (see the code below).

To create both source test arrays, I defined the following function:

def getTestDf(nRows : int, tt : str, valShift=0):
    qn = np.array(list(map(lambda i: tt + str(i),np.arange(nRows, dtype=int))))
    ans = np.arange(nRows * 3, dtype=int).reshape((-1, 3)) + valShift
    return pd.concat([pd.DataFrame({'Question (en)' : qn}), pd.DataFrame(ans,
        columns=['Correct Answer', 'Incorrect Answer 1', 'Incorrect Answer 2'])], axis=1)

and called it:

template = getTestDf(80, 'Question_')
data = getTestDf(9, 'New question ', 1000)

Note that after I created template I counted that just 9 rows in data are needed, so I created data with just 9 rows.

This way the initial part of template contains:

  Question (en)  Correct Answer  Incorrect Answer 1  Incorrect Answer 2
0    Question_0               0                   1                   2
1    Question_1               3                   4                   5
2    Question_2               6                   7                   8
3    Question_3               9                  10                  11
4    Question_4              12                  13                  14
...

and data (in full):

    Question (en)  Correct Answer  Incorrect Answer 1  Incorrect Answer 2
0  New question 0            1000                1001                1002
1  New question 1            1003                1004                1005
2  New question 2            1006                1007                1008
3  New question 3            1009                1010                1011
4  New question 4            1012                1013                1014
5  New question 5            1015                1016                1017
6  New question 6            1018                1019                1020
7  New question 7            1021                1022                1023
8  New question 8            1024                1025                1026

Now, to copy selected rows, run just:

template.iloc[::9] = data.values

The initial part of template contains now:

     Question (en)  Correct Answer  Incorrect Answer 1  Incorrect Answer 2
0   New question 0            1000                1001                1002
1       Question_1               3                   4                   5
2       Question_2               6                   7                   8
3       Question_3               9                  10                  11
4       Question_4              12                  13                  14
5       Question_5              15                  16                  17
6       Question_6              18                  19                  20
7       Question_7              21                  22                  23
8       Question_8              24                  25                  26
9   New question 1            1003                1004                1005
10     Question_10              30                  31                  32
11     Question_11              33                  34                  35
12     Question_12              36                  37                  38
13     Question_13              39                  40                  41
14     Question_14              42                  43                  44
15     Question_15              45                  46                  47
16     Question_16              48                  49                  50
17     Question_17              51                  52                  53
18  New question 2            1006                1007                1008
19     Question_19              57                  58                  59

I am pretty sure that there are simpler/nicer ways, but just off the top of my head:

template_9=template.iloc[::9,0:2].copy()
# outer join
template_9['key'] = 0
data['key'] = 0
template_9.merge(data, how='left') # you don't need left here, but I think it's clearer
template_9.drop('key', axis=1, inplace=True)
template = pd.concat([template,template_9]).drop_duplicates(keep='last')

In case you want to keep the index replace:

template_9.reset_index().merge(data, how='left').set_index('index')

and then you can sort by index in the end.

PS I'm assuming column names are the same in both data frames, but it should be straightforward to adapt it anyway.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM