Concatenate data in CSV files with overlapping data in columns

Question

I have a couple CSV files that have vaccine data, such as this:

File 1

Entity,Code,Date,people_vaccinated
Wisconsin,,2021-01-12,125895
Wisconsin,,2021-01-13,125895
Wisconsin,,2021-01-14,135841
Wisconsin,,2021-01-15,151387
Wisconsin,,2021-01-19,188144
Wisconsin,,2021-01-20,193461
Wisconsin,,2021-01-21,204746
Wisconsin,,2021-01-22,221067
Wisconsin,,2021-01-23,241512
Wisconsin,,2021-01-24,260664
Wyoming,,2021-01-12,13577
Wyoming,,2021-01-13,14406
Wyoming,,2021-01-14,17310
Wyoming,,2021-01-15,19931
Wyoming,,2021-01-19,24788
Wyoming,,2021-01-20,25841
Wyoming,,2021-01-21,25841
Wyoming,,2021-01-22,29993
Wyoming,,2021-01-23,32746
Wyoming,,2021-01-24,35868

File 2

Entity,Code,Date,people_fully_vaccinated
Wisconsin,,2021-01-12,11343
Wisconsin,,2021-01-13,11343
Wisconsin,,2021-01-15,17108
Wisconsin,,2021-01-19,23641
Wisconsin,,2021-01-20,27312
Wisconsin,,2021-01-21,32268
Wisconsin,,2021-01-22,37901
Wisconsin,,2021-01-23,42229
Wisconsin,,2021-01-24,45641
Wyoming,,2021-01-12,2116
Wyoming,,2021-01-13,2559
Wyoming,,2021-01-15,2803
Wyoming,,2021-01-19,3242
Wyoming,,2021-01-20,3441
Wyoming,,2021-01-21,3441
Wyoming,,2021-01-22,4515
Wyoming,,2021-01-23,4773
Wyoming,,2021-01-24,4895

Not all the data (specifically dates going with locations) overlaps, but for the ones that do, how would I combine the last column? I'm guessing using pandas would be best, but I don't want to get stuck messing with a bunch of nested loops.

Answer 1

If you are trying to merge file1 with file2 only for the records in file1 then solution:

import pandas as pd
## suppose file1_df and file2_df are related Dataframe object for file1 and file2 respectively.
merged_df = pd.merge(file1_df, file2_df, how='left' on=['Entity','Code','Date'])

Note: if you are familiar with set operations, you can do right outer joint, left joint, inner joint, and full outer join changing how parameter in the above function call. reference

Answer 2

import pandas as pd
data1 = pd.read_csv('file1.csv') # path of file1
data2 = pd.read_csv('file2.csv') # path of file2
data1['Code'] = data1['Code'].fillna(0) # replace Nan with 0
data2['Code'] = data2['Code'].fillna(0) # replace Nan with 0
combined_data = data1.append(data2,ignore_index=True) # since both the file have same column so we append one in another
result = combined_data.groupby(['Entity','Code','Date'], as_index=False)['people_vaccinated'].sum() # group by column and add people who got vaccinated based on same location and date and code
print(result)

Entity:        Code:  Date:      people_vaccinated
0   Wisconsin   0.0 12-01-2021  137238
1   Wisconsin   0.0 13-01-2021  137238
2   Wisconsin   0.0 14-01-2021  135841
3   Wisconsin   0.0 15-01-2021  168495
4   Wisconsin   0.0 19-01-2021  211785
5   Wisconsin   0.0 20-01-2021  220773
6   Wisconsin   0.0 21-01-2021  237014
7   Wisconsin   0.0 22-01-2021  258968
8   Wisconsin   0.0 23-01-2021  283741
9   Wisconsin   0.0 24-01-2021  306305
10  Wyoming     0.0 12-01-2021  15693
11  Wyoming     0.0 13-01-2021  16965
12  Wyoming     0.0 14-01-2021  17310
13  Wyoming     0.0 15-01-2021  22734
14  Wyoming     0.0 19-01-2021  28030
15  Wyoming     0.0 20-01-2021  29282
16  Wyoming     0.0 21-01-2021  29282
17  Wyoming     0.0 22-01-2021  34508
18  Wyoming     0.0 23-01-2021  37519
19  Wyoming     0.0 24-01-2021  40763

Concatenate data in CSV files with overlapping data in columns

Question

2 answers

solution1
0 ACCPTED 2021-01-25 15:08:53

solution2
0 2021-01-25 15:54:41

Concatenate data in CSV files with overlapping data in columns

Question

2 answers

solution1 0 ACCPTED 2021-01-25 15:08:53

solution2 0 2021-01-25 15:54:41

solution1
0 ACCPTED 2021-01-25 15:08:53

solution2
0 2021-01-25 15:54:41