简体   繁体   中英

Add row in dataframe with same value as in specific column

I have this dataframe:

    0       1       2         3
0   Frank   48.2    test_1    file_1
1   John    46.7    test_1    file_1
2   Alice   39.3    test_2    file_2
3   Kim     35.6    test_2    file_2
4   Sasha   25.5    test_3    file_3
.... 
2306 rows × 4 columns   

I want that for every different value on the column 2 (there are 140 different values), it will be added a row in my dataframe before the first row with that value, keeping the file_number value in the column 3 (I will need that column for saving the dataframe splitted in different files depending on the value in it), like this:

    0        1       2       3
0   test_1                   file_1
1   Frank    48.2    test_1  file_1
2   John     46.7    test_1  file_1
3   test_2                   file_2
4   Alice    39.3    test_2  file_2
5   Kim      35.6    test_2  file_2
6   test_3                   file_3
7   Sasha    25.5    test_3  file_3
....

Which is the simplest way to achieve it? Thank you for your time!

You can check with drop_duplicates , then concat them back

s = df.drop_duplicates(['2','3']).drop(['0','1'],axis=1).rename({'2':'0'},axis=1)
out = pd.concat([s,df]).sort_index().reindex(columns=df.columns)
out
Out[15]: 
        0     1       2       3
0  test_1   NaN     NaN  file_1
0   Frank  48.2  test_1  file_1
1    John  46.7  test_1  file_1
2  test_2   NaN     NaN  file_2
2   Alice  39.3  test_2  file_2
3     Kim  35.6  test_2  file_2
4  test_3   NaN     NaN  file_3
4   Sasha  25.5  test_3  file_3

You can filter the rows with the correct value of column 2 , add to that DataFrame the row you want, and concatenate all the DataFrames obtained into one. An example is the following code:

import pandas as pd

df = <READ_YOUR_DF>
all_df = []
for i in df["2"].unique():
        new_df = pd.DataFrame(data= {"0": [i], "1":[""],"2":[""], "3":[""]})
        filter_df = df[df["2"] == i]
        to_add = pd.concat([new_df, filter_df], ignore_index=True)
        all_df.append(to_add)

result_df=pd.concat(all_df, ignore_index=True)

If you want to avoid listing all the column names when creating new_df you can use a dictionary comprehension that uses as key the iteration over df.columns

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM