Hi I would like to merge the records in python pandas dataframe
Current Dataframe
Date Value Date Description Amount
01/07/2019 01/07/2019 CHEQUE WITHDRAW 1000.00
01/07/2019 01/07/2019 SUNDRY CREDIT CAPITAL FUND FEES 100.00
02/07/2019 02/07/2019 CHEQUE WITHDRAW 10.00
02/07/2019 02/07/2019 SUNDRY CREDIT FROM HEAD OFFICE 10.00
02/07/2019 02/07/2019 CHEQUE WITHDRAW 50.00
Expected dataframe
Date Value Date Description Amount
01/07/2019 01/07/2019 CHEQUE WITHDRAW 1000.00
01/07/2019 01/07/2019 SUNDRY CREDIT CAPITAL FUND FEES 100.00
02/07/2019 02/07/2019 CHEQUE WITHDRAW 10.00
02/07/2019 02/07/2019 SUNDRY CREDIT FROM HEAD OFFICE 10.00
02/07/2019 02/07/2019 CHEQUE WITHDRAW 50.00
Getting error KeyError: 26
I have tried to loop through rows and find the amount column null and merge with description and then drop the row
for index, row in df.iterrows():
if (pd.isnull(row[3]) == True):
df.loc[index-1][2] = str(df.loc[index-1][2]) + ' ' + str(df.loc[index][0])
df.drop([index],inplace=True)
You could try that as follows (on the end of this posting, you can find my test data):
# create a new aux column "Description new" that will be filled with the
# new description
df['Description new']= df['Description']
# create an auxillary data frame copy that will be shifted
# to match the wrapped lines and add another aux column
# that just contains the wrapped and not yet added segments
df_shifted= pd.DataFrame(df, copy=True)
df_shifted['Continued Description']= df_shifted['Description'].where(df_shifted['Date'].isna(), None)
# it seems you have just up to 2 line breaks, so we would have to
# do it just 2 times
for i in range(3):
# shift the aux df to get the wrapped descriptions in the same line
df_shifted= df_shifted.shift(-1)
# concatenate them
df['Description new']= df['Description new'].str.cat(df_shifted['Continued Description'].fillna(''), sep=' ').str.strip(' ')
# delete the added parts from Continued Description in order
# not to add them to the previous transaction's description
df_shifted.loc[~df['Date'].isna(), 'Continued Description']= None
df.loc[~df['Date'].isna(), 'Description new']
This returns something like:
0 CHEQUE WITHDRAW
1 SUNDRY CREDIT CAPITAL FUND FEES
4 CHEQUE WITHDRAW
5 SUNDRY CREDIT FROM HEAD OFFICE
7 CHEQUE WITHDRAW
Name: Description new, dtype: object
You can test that with the data produced by the following code:
import io
csv="""
Date;Value Date;Description;Amount
01/07/2019;01/07/2019;CHEQUE WITHDRAW;1000.00
01/07/2019;01/07/2019;SUNDRY CREDIT;100.00
;;CAPITAL FUND;
;;FEES;
02/07/2019;02/07/2019;CHEQUE WITHDRAW;10.00
02/07/2019;02/07/2019;SUNDRY CREDIT;10.00
;;FROM HEAD OFFICE;
02/07/2019;02/07/2019;CHEQUE WITHDRAW;50.00
"""
df=pd.read_csv(io.StringIO(csv), sep=';')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.