I have a couple CSV files that have vaccine data, such as this:
File 1
Entity,Code,Date,people_vaccinated
Wisconsin,,2021-01-12,125895
Wisconsin,,2021-01-13,125895
Wisconsin,,2021-01-14,135841
Wisconsin,,2021-01-15,151387
Wisconsin,,2021-01-19,188144
Wisconsin,,2021-01-20,193461
Wisconsin,,2021-01-21,204746
Wisconsin,,2021-01-22,221067
Wisconsin,,2021-01-23,241512
Wisconsin,,2021-01-24,260664
Wyoming,,2021-01-12,13577
Wyoming,,2021-01-13,14406
Wyoming,,2021-01-14,17310
Wyoming,,2021-01-15,19931
Wyoming,,2021-01-19,24788
Wyoming,,2021-01-20,25841
Wyoming,,2021-01-21,25841
Wyoming,,2021-01-22,29993
Wyoming,,2021-01-23,32746
Wyoming,,2021-01-24,35868
File 2
Entity,Code,Date,people_fully_vaccinated
Wisconsin,,2021-01-12,11343
Wisconsin,,2021-01-13,11343
Wisconsin,,2021-01-15,17108
Wisconsin,,2021-01-19,23641
Wisconsin,,2021-01-20,27312
Wisconsin,,2021-01-21,32268
Wisconsin,,2021-01-22,37901
Wisconsin,,2021-01-23,42229
Wisconsin,,2021-01-24,45641
Wyoming,,2021-01-12,2116
Wyoming,,2021-01-13,2559
Wyoming,,2021-01-15,2803
Wyoming,,2021-01-19,3242
Wyoming,,2021-01-20,3441
Wyoming,,2021-01-21,3441
Wyoming,,2021-01-22,4515
Wyoming,,2021-01-23,4773
Wyoming,,2021-01-24,4895
Not all the data (specifically dates going with locations) overlaps, but for the ones that do, how would I combine the last column? I'm guessing using pandas would be best, but I don't want to get stuck messing with a bunch of nested loops.
If you are trying to merge file1
with file2
only for the records in file1
then solution:
import pandas as pd
## suppose file1_df and file2_df are related Dataframe object for file1 and file2 respectively.
merged_df = pd.merge(file1_df, file2_df, how='left' on=['Entity','Code','Date'])
Note: if you are familiar with set operations, you can do right outer joint, left joint, inner joint, and full outer join changing how parameter in the above function call. reference
import pandas as pd
data1 = pd.read_csv('file1.csv') # path of file1
data2 = pd.read_csv('file2.csv') # path of file2
data1['Code'] = data1['Code'].fillna(0) # replace Nan with 0
data2['Code'] = data2['Code'].fillna(0) # replace Nan with 0
combined_data = data1.append(data2,ignore_index=True) # since both the file have same column so we append one in another
result = combined_data.groupby(['Entity','Code','Date'], as_index=False)['people_vaccinated'].sum() # group by column and add people who got vaccinated based on same location and date and code
print(result)
Entity: Code: Date: people_vaccinated
0 Wisconsin 0.0 12-01-2021 137238
1 Wisconsin 0.0 13-01-2021 137238
2 Wisconsin 0.0 14-01-2021 135841
3 Wisconsin 0.0 15-01-2021 168495
4 Wisconsin 0.0 19-01-2021 211785
5 Wisconsin 0.0 20-01-2021 220773
6 Wisconsin 0.0 21-01-2021 237014
7 Wisconsin 0.0 22-01-2021 258968
8 Wisconsin 0.0 23-01-2021 283741
9 Wisconsin 0.0 24-01-2021 306305
10 Wyoming 0.0 12-01-2021 15693
11 Wyoming 0.0 13-01-2021 16965
12 Wyoming 0.0 14-01-2021 17310
13 Wyoming 0.0 15-01-2021 22734
14 Wyoming 0.0 19-01-2021 28030
15 Wyoming 0.0 20-01-2021 29282
16 Wyoming 0.0 21-01-2021 29282
17 Wyoming 0.0 22-01-2021 34508
18 Wyoming 0.0 23-01-2021 37519
19 Wyoming 0.0 24-01-2021 40763
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.