简体   繁体   中英

Concatenating lists and removing duplicates

I have three spreadsheets that all have a month/year column. There is some overlap between the spreadsheets (ie one covers 1998 to 2015 and one covers 2012 to 2020). I want to have a combined list of all the months/years with no duplicates. I have achieved this but I feel there must be a cleaner way to do so.

Dataframes are somewhat similar:

Month VALUE
1998M01 1`
1998M02 2
import pandas as pd

unemp8315 = pd.read_csv('Unemployment 19832015.csv')
unemp9821 = pd.read_csv('Unemployment 19982021.csv')
unempcovid = pd.read_csv('Unemployment Covid.csv')

print(unemp8315)
print(unemp9821)
print(unempcovid)

monthlist = []

for i in unemp8315['Month']:
    monthlist.append(i)

monthlist2 = []

for b in unemp9821['Month']:
    monthlist2.append(b)

monthlist3 = []

for c in unempcovid['Month']:
    monthlist3.append(c)

full_month_list = monthlist + monthlist2 + monthlist3

fullpd = pd.DataFrame(data=full_month_list)

clean_month_list = fullpd.drop_duplicates()

print(clean_month_list)

You can do something like this:

files = ['Unemployment 19832015.csv',
         'Unemployment 19982021.csv',
         'Unemployment Covid.csv']

dfs = [pd.read_csv(file)["Month"] for file in files]

clean_month_list = pd.concat(dfs).drop_duplicates()

There's no need to iterate over every single entry, you can easily concatenate the dataframes, select the month column and get rid of the duplicates there

fullpd = pd.concat([unemp8315, unemp9821, unepmcovid], axis=0)
clean_month_list = fullpd['Month'].drop_duplicates()

load them into a dictionary instead of a list dict[month] = value?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM