簡體   English   中英

如何取出這些元素並將它們組合成一個數據框

[英]How to take out these elements and put them together into a dataframe

[{'complete': True, 'volume': 116, 'time': '2020-01-17T19:15:00.000000000Z', 'mid': {'o': '1.10916', 'h': '1.10917', 'l': '1.10906', 'c': '1.10912'}}, {'complete': True, 'volume': 136, 'time': '2020-01-17T19:30:00.000000000Z', 'mid': {'o': '1.10914', 'h': '1.10922', 'l': '1.10908', 'c': '1.10919'}}, {'complete': True, 'volume': 223, 'time': '2020-01-17T19:45:00.000000000Z', 'mid': {'o': '1.10920', 'h': '1.10946', 'l': '1.10920', 'c': '1.10930'}}, {'complete': True, 'volume': 203, 'time': '2020-01-17T20:00:00.000000000Z', 'mid': {'o': '1.10930', 'h': '1.10931', 'l': '1.10919', 'c': '1.10928'}}, {'complete': True, 'volume': 87, 'time': '2020-01-17T20:15:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10934', 'l': '1.10922', 'c': '1.10926'}}, {'complete': True, 'volume': 102, 'time': '2020-01-17T20:30:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10928', 'l': '1.10913', 'c': '1.10920'}}, {'complete': True, 'volume': 277, 'time': '2020-01-17T20:45:00.000000000Z', 'mid': {'o': '1.10918', 'h': '1.10929', 'l': '1.10913', 'c': '1.10928'}}, {'complete': True, 'volume': 103, 'time': '2020-01-17T21:00:00.000000000Z', 'mid': {'o': '1.10927', 'h': '1.10929', 'l': '1.10920', 'c': '1.10924'}}, {'complete': True, 'volume': 54, 'time': '2020-01-17T21:15:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10926', 'l': '1.10910', 'c': '1.10912'}}, {'complete': False, 'volume': 15, 'time': '2020-01-17T21:30:00.000000000Z', 'mid': {'o': '1.10913', 'h': '1.10918', 'l': '1.10912', 'c': '1.10913'}}]

我試圖從這個列表中去掉所有的“時間”和“中間”。 在'mid'中,有'o'、'h'、'l'、'c'字典。 有沒有辦法將“時間”和這些字典組合成一個數據框?

在此處輸入圖片說明

嘗試

df = pd.DataFrame(your_data)
df = pd.concat([df['time'], df['mid'].apply(pd.Series)], axis=1)
                             time        o        h        l        c
0  2020-01-17T19:15:00.000000000Z  1.10916  1.10917  1.10906  1.10912
1  2020-01-17T19:30:00.000000000Z  1.10914  1.10922  1.10908  1.10919
2  2020-01-17T19:45:00.000000000Z  1.10920  1.10946  1.10920  1.10930
3  2020-01-17T20:00:00.000000000Z  1.10930  1.10931  1.10919  1.10928
4  2020-01-17T20:15:00.000000000Z  1.10926  1.10934  1.10922  1.10926
5  2020-01-17T20:30:00.000000000Z  1.10926  1.10928  1.10913  1.10920
6  2020-01-17T20:45:00.000000000Z  1.10918  1.10929  1.10913  1.10928
7  2020-01-17T21:00:00.000000000Z  1.10927  1.10929  1.10920  1.10924
8  2020-01-17T21:15:00.000000000Z  1.10926  1.10926  1.10910  1.10912
9  2020-01-17T21:30:00.000000000Z  1.10913  1.10918  1.10912  1.10913

請嘗試以下操作:

import pandas as pd

l = [{'complete': True, 'volume': 116, 'time': '2020-01-17T19:15:00.000000000Z', 'mid': {'o': '1.10916', 'h': '1.10917', 'l': '1.10906', 'c': '1.10912'}}, {'complete': True, 'volume': 136, 'time': '2020-01-17T19:30:00.000000000Z', 'mid': {'o': '1.10914', 'h': '1.10922', 'l': '1.10908', 'c': '1.10919'}}, {'complete': True, 'volume': 223, 'time': '2020-01-17T19:45:00.000000000Z', 'mid': {'o': '1.10920', 'h': '1.10946', 'l': '1.10920', 'c': '1.10930'}}, {'complete': True, 'volume': 203, 'time': '2020-01-17T20:00:00.000000000Z', 'mid': {'o': '1.10930', 'h': '1.10931', 'l': '1.10919', 'c': '1.10928'}}, {'complete': True, 'volume': 87, 'time': '2020-01-17T20:15:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10934', 'l': '1.10922', 'c': '1.10926'}}, {'complete': True, 'volume': 102, 'time': '2020-01-17T20:30:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10928', 'l': '1.10913', 'c': '1.10920'}}, {'complete': True, 'volume': 277, 'time': '2020-01-17T20:45:00.000000000Z', 'mid': {'o': '1.10918', 'h': '1.10929', 'l': '1.10913', 'c': '1.10928'}}, {'complete': True, 'volume': 103, 'time': '2020-01-17T21:00:00.000000000Z', 'mid': {'o': '1.10927', 'h': '1.10929', 'l': '1.10920', 'c': '1.10924'}}, {'complete': True, 'volume': 54, 'time': '2020-01-17T21:15:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10926', 'l': '1.10910', 'c': '1.10912'}}, {'complete': False, 'volume': 15, 'time': '2020-01-17T21:30:00.000000000Z', 'mid': {'o': '1.10913', 'h': '1.10918', 'l': '1.10912', 'c': '1.10913'}}]

df = pd.DataFrame()

for ll in l:
     df = df.append(pd.DataFrame(ll['mid'], index=[ll['time']]))

假設您的樣本數據名為data

>>> pd.DataFrame([d['mid'] for d in data], index=[d['time'] for d in data])
                                      o        h        l        c
2020-01-17T19:15:00.000000000Z  1.10916  1.10917  1.10906  1.10912
2020-01-17T19:30:00.000000000Z  1.10914  1.10922  1.10908  1.10919
2020-01-17T19:45:00.000000000Z  1.10920  1.10946  1.10920  1.10930
2020-01-17T20:00:00.000000000Z  1.10930  1.10931  1.10919  1.10928
2020-01-17T20:15:00.000000000Z  1.10926  1.10934  1.10922  1.10926
2020-01-17T20:30:00.000000000Z  1.10926  1.10928  1.10913  1.10920
2020-01-17T20:45:00.000000000Z  1.10918  1.10929  1.10913  1.10928
2020-01-17T21:00:00.000000000Z  1.10927  1.10929  1.10920  1.10924
2020-01-17T21:15:00.000000000Z  1.10926  1.10926  1.10910  1.10912
2020-01-17T21:30:00.000000000Z  1.10913  1.10918  1.10912  1.10913

時間安排

data *= 1000  # Now list of 10k dictionaries.

%timeit df = pd.DataFrame([d['mid'] for d in data], index=[d['time'] for d in data])
# 13.4 ms ± 361 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
df = pd.DataFrame(data)
df = pd.concat([df['time'], df['mid'].apply(pd.Series)], axis=1)
# 4.52 s ± 494 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
df = pd.DataFrame()
for record in data:
     df = df.append(pd.DataFrame(record['mid'], index=[record['time']]))
# 21.4 s ± 2.86 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM