[英]Extracting data from a list and converting it from Python List to a DataFrame
我在下面有一個清單。 你能幫我提取唯一突出顯示的區域並將其作為 DataFrame 嗎? 非常感謝先進!
list_a = ['', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'Total', '2019', '', '', '', '919.69', '1043.26', '1158.34', '1245.30', '1112.40', '868.93', '513.33', '432.67', '244.26', '9160.74', '2020', '371.35', '463.13', '722.77', '865.92', '1252.37', '468.15', '', '', '', '', '', '', '4143.69', '', '', '', '', '', '', '', '', '', '', '', '', '', '24572.68', 'Mean value', '383.56', '452.64', '736.38', '915.75', '1112.23', '1186.21', '1266.01', '1101.05', '786.72', '568.93', '412.19', '276.59', '9198.28', 'Year portion', '4.17%', '4.92%', '8.01%', '9.96%', '12.09%', '12.90%', '13.76%', '11.97%', '8.55%', '6.19%', '4.48%', '3.01%', '100.00%', 'Yield expectations *', '274.04', '411.06', '668.98', '878.54', '1007.50', '1063.92', '1039.74', '943.02', '741.52', '515.84', '274.04', '241.80', '8060.00']
嘗試:
n=14
df = pd.DataFrame([list_a[i:i + n] for i in range(0, len(list_a), n)]).T
new_header = df.iloc[0]
df = df[1:]
df.columns = new_header
2019 2020 Mean value Year portion Yield expectations *
1 January 371.35 383.56 4.17% 274.04
2 February 463.13 452.64 4.92% 411.06
3 March 722.77 736.38 8.01% 668.98
4 April 919.69 865.92 915.75 9.96% 878.54
5 May 1043.26 1252.37 1112.23 12.09% 1007.50
6 June 1158.34 468.15 1186.21 12.90% 1063.92
7 July 1245.30 1266.01 13.76% 1039.74
8 August 1112.40 1101.05 11.97% 943.02
9 September 868.93 786.72 8.55% 741.52
10 October 513.33 568.93 6.19% 515.84
11 November 432.67 412.19 4.48% 274.04
12 December 244.26 276.59 3.01% 241.80
13 Total 9160.74 4143.69 24572.689198.28 100.00% 8060.00
df1 = df[:-1]
df1.drop(['', 'Mean value', 'Year portion', 'Yield expectations *'], axis=1, inplace=True)
df1 = df1.unstack().reset_index(name='value')
df1.set_index(pd.to_datetime(df1[0].astype(str)+ '-' + df1['level_1'].astype(str)), inplace=True)
df1.drop([0, 'level_1'], axis=1, inplace = True)
value
2019-01-01
2019-02-01
2019-03-01
2019-04-01 919.69
2019-05-01 1043.26
2019-06-01 1158.34
2019-07-01 1245.30
2019-08-01 1112.40
2019-09-01 868.93
2019-10-01 513.33
2019-11-01 432.67
2019-12-01 244.26
2020-01-01 371.35
2020-02-01 463.13
2020-03-01 722.77
2020-04-01 865.92
2020-05-01 1252.37
2020-06-01 468.15
2020-07-01
2020-08-01
2020-09-01
2020-10-01
2020-11-01
2020-12-01
這很復雜,但它做了我想要的。
from datetime import date, timedelta
import datetime
from dateutil import relativedelta
count = 0
output = [[], [], [], [], [], [], [], [], [], [], [], [], [], []]
for item in result:
output[count % 14].append(item)
count += 1
production_data = pd.DataFrame(output[1:], columns=output[0])
production_data2 = production_data.drop(12)
production_data2.drop(['', 'Mean value', 'Year portion', 'Yield expectations *'], axis=1, inplace=True)
production_data3 = production_data2.transpose()
production_data4 = production_data3.values.tolist()
production_data_list = []
for item in production_data4:
for element in item:
production_data_list.append(element)
start_year = list(production_data2.columns.values)[0]
start_date_str = start_year + "-01-01"
start_date = datetime.datetime.strptime(start_date_str, '%Y-%m-%d')
nummonths = len(production_data_list)
date_list = []
for x in range(0, nummonths):
date_list.append(start_date.date() + relativedelta.relativedelta(months=x))
combined_dict = dict(zip(date_list, production_data_list))
df = pd.DataFrame(combined_dict, index=[0]).transpose()
df.rename(columns={0: "list_a"}, inplace=True)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.