[英]Convert list of different element types to list of integers
I have a list of different element types (extracted from a column from a dataframe) that I would like to convert to the same element type (integers).我有一个不同元素类型的列表(从数据框中的列中提取),我想将其转换为相同的元素类型(整数)。 The dataframe looks like this:
数据框如下所示:
Because some rows under column "Systemic Banking Crisis (starting date)" only have one year, while others have several, the extracted list ends up looking like this:由于“Systemic Banking Crisis (starting date)”列下的某些行只有一年,而其他行则有几年,因此提取的列表最终如下所示:
[1994, 1990, nan, '1980, 1989, 1995, 2001', 1994, nan, 2008, 1995, 1987, nan, 1995, 2008, nan,...] [1994, 1990, nan, '1980, 1989, 1995, 2001', 1994, nan, 2008, 1995, 1987, nan, 1995, 2008, nan,...]
The countries that have multiple years (multiple banking crises) are in a string, while the countries with only one year are a integer.有多年(多次银行危机)的国家在一个字符串中,而只有一年的国家是一个整数。 I would like to turn the data into panel data by looping through each country and making a dummy variable running from 1970 to 2019 that takes the value 1 if there is a banking crisis and 0 if not.
我想通过循环遍历每个国家并制作一个从 1970 年到 2019 年的虚拟变量,将数据转换为面板数据,如果存在银行危机,则取值为 1,否则取值为 0。 To do this I have run the following code:
为此,我运行了以下代码:
data_banking = data['Systemic Banking Crisis (starting date)'].to_list()
data_currency = data['Currency Crisis (year)'].to_list()
countries = data['Country'].to_list()
#making lists
years = [1970]
for i in range(1971, 2020):
years.append(i)
banking_crisis = []
currency_crisis = []
countries_long = []
for i in countries:
country = [i for x in range(50)]
countries_long.extend(country)
years_long = []
for i in range(166):
years_long.extend(years)
for i in data_banking:
for y in years:
if y==i:
banking_crisis.append(1)
else:
banking_crisis.append(0)
banking = pd.DataFrame(list(zip(countries_long, years_long, banking_crisis)))
This works for all the countries with only one banking crisis and returns a dataframe that looks like this:这适用于所有只有一次银行危机的国家,并返回一个如下所示的数据框:
However, for the countries with multiple banking crises, python doesn't understand the code because the years are in one string.但是,对于发生多次银行危机的国家,python 无法理解代码,因为年份在一个字符串中。 How do I fix this?
我该如何解决? I have tried to convert the list
data_banking
to a list of lists, convert all list elements to strings, then split the strings and convert each string element to integers, so that I could loop through each element in each (country)list of the data_banking list, but it won't work.我试图将列表
data_banking
转换为列表列表,将所有列表元素转换为字符串,然后拆分字符串并将每个字符串元素转换为整数,以便我可以遍历 data_banking 的每个(国家/地区)列表中的每个元素列表,但它不会工作。 These are the different variations of what I have tried:这些是我尝试过的不同变体:
def list_of_lists(lst):
list_1 = [[el] for el in lst]
#listToStr = ' '.join(map(str, lists))
return list_1
#list_1 = listToString(lists)
#for string in list_values:
# list_values = list_1.split(",")
# string = int(string)
#return list_1
data_banking = list_of_lists(data_banking)
for lists in data_banking:
for item in lists:
item = float(item)
# lists = [str(x) for x in lists]
What should I do?我该怎么办?
I'd do this entire operation in two steps.我会分两步完成整个操作。 (1) First, I iterate over the dataset and store a list of dictionaries containing the country and each singular year its associated with (dropping NaNs), via some string formatting.
(1) 首先,我遍历数据集并通过一些字符串格式存储包含国家和每个单数年份的字典列表(删除 NaN)。 (2) I then compile these results into a new data frame, making sure that the year column is numeric.
(2) 然后我将这些结果编译到一个新的数据框中,确保年份列是数字。 Here's the code:
这是代码:
# Step 1
bank = 'Systemic Banking Crisis (starting date)'
rows = []
for _, row in data.iterrows():
country = row['Country']
years = row[bank]
if pd.isna(years):
continue
for year in years.split(','):
rows.append({'Country': country, bank:pd.to_numeric(year)})
# Step 2
df = pd.DataFrame(rows)
df[bank] pd.to_numeric(df[bank])
Let me know if this doesn't work for you.如果这对您不起作用,请告诉我。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.