![](/img/trans.png)
[英]How to take values out of a pandas dataframe and put them into a numpy array?
[英]How to take out these elements and put them together into a dataframe
[{'complete': True, 'volume': 116, 'time': '2020-01-17T19:15:00.000000000Z', 'mid': {'o': '1.10916', 'h': '1.10917', 'l': '1.10906', 'c': '1.10912'}}, {'complete': True, 'volume': 136, 'time': '2020-01-17T19:30:00.000000000Z', 'mid': {'o': '1.10914', 'h': '1.10922', 'l': '1.10908', 'c': '1.10919'}}, {'complete': True, 'volume': 223, 'time': '2020-01-17T19:45:00.000000000Z', 'mid': {'o': '1.10920', 'h': '1.10946', 'l': '1.10920', 'c': '1.10930'}}, {'complete': True, 'volume': 203, 'time': '2020-01-17T20:00:00.000000000Z', 'mid': {'o': '1.10930', 'h': '1.10931', 'l': '1.10919', 'c': '1.10928'}}, {'complete': True, 'volume': 87, 'time': '2020-01-17T20:15:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10934', 'l': '1.10922', 'c': '1.10926'}}, {'complete': True, 'volume': 102, 'time': '2020-01-17T20:30:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10928', 'l': '1.10913', 'c': '1.10920'}}, {'complete': True, 'volume': 277, 'time': '2020-01-17T20:45:00.000000000Z', 'mid': {'o': '1.10918', 'h': '1.10929', 'l': '1.10913', 'c': '1.10928'}}, {'complete': True, 'volume': 103, 'time': '2020-01-17T21:00:00.000000000Z', 'mid': {'o': '1.10927', 'h': '1.10929', 'l': '1.10920', 'c': '1.10924'}}, {'complete': True, 'volume': 54, 'time': '2020-01-17T21:15:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10926', 'l': '1.10910', 'c': '1.10912'}}, {'complete': False, 'volume': 15, 'time': '2020-01-17T21:30:00.000000000Z', 'mid': {'o': '1.10913', 'h': '1.10918', 'l': '1.10912', 'c': '1.10913'}}]
我試圖從這個列表中去掉所有的“時間”和“中間”。 在'mid'中,有'o'、'h'、'l'、'c'字典。 有沒有辦法將“時間”和這些字典組合成一個數據框?
嘗試
df = pd.DataFrame(your_data)
df = pd.concat([df['time'], df['mid'].apply(pd.Series)], axis=1)
time o h l c
0 2020-01-17T19:15:00.000000000Z 1.10916 1.10917 1.10906 1.10912
1 2020-01-17T19:30:00.000000000Z 1.10914 1.10922 1.10908 1.10919
2 2020-01-17T19:45:00.000000000Z 1.10920 1.10946 1.10920 1.10930
3 2020-01-17T20:00:00.000000000Z 1.10930 1.10931 1.10919 1.10928
4 2020-01-17T20:15:00.000000000Z 1.10926 1.10934 1.10922 1.10926
5 2020-01-17T20:30:00.000000000Z 1.10926 1.10928 1.10913 1.10920
6 2020-01-17T20:45:00.000000000Z 1.10918 1.10929 1.10913 1.10928
7 2020-01-17T21:00:00.000000000Z 1.10927 1.10929 1.10920 1.10924
8 2020-01-17T21:15:00.000000000Z 1.10926 1.10926 1.10910 1.10912
9 2020-01-17T21:30:00.000000000Z 1.10913 1.10918 1.10912 1.10913
請嘗試以下操作:
import pandas as pd
l = [{'complete': True, 'volume': 116, 'time': '2020-01-17T19:15:00.000000000Z', 'mid': {'o': '1.10916', 'h': '1.10917', 'l': '1.10906', 'c': '1.10912'}}, {'complete': True, 'volume': 136, 'time': '2020-01-17T19:30:00.000000000Z', 'mid': {'o': '1.10914', 'h': '1.10922', 'l': '1.10908', 'c': '1.10919'}}, {'complete': True, 'volume': 223, 'time': '2020-01-17T19:45:00.000000000Z', 'mid': {'o': '1.10920', 'h': '1.10946', 'l': '1.10920', 'c': '1.10930'}}, {'complete': True, 'volume': 203, 'time': '2020-01-17T20:00:00.000000000Z', 'mid': {'o': '1.10930', 'h': '1.10931', 'l': '1.10919', 'c': '1.10928'}}, {'complete': True, 'volume': 87, 'time': '2020-01-17T20:15:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10934', 'l': '1.10922', 'c': '1.10926'}}, {'complete': True, 'volume': 102, 'time': '2020-01-17T20:30:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10928', 'l': '1.10913', 'c': '1.10920'}}, {'complete': True, 'volume': 277, 'time': '2020-01-17T20:45:00.000000000Z', 'mid': {'o': '1.10918', 'h': '1.10929', 'l': '1.10913', 'c': '1.10928'}}, {'complete': True, 'volume': 103, 'time': '2020-01-17T21:00:00.000000000Z', 'mid': {'o': '1.10927', 'h': '1.10929', 'l': '1.10920', 'c': '1.10924'}}, {'complete': True, 'volume': 54, 'time': '2020-01-17T21:15:00.000000000Z', 'mid': {'o': '1.10926', 'h': '1.10926', 'l': '1.10910', 'c': '1.10912'}}, {'complete': False, 'volume': 15, 'time': '2020-01-17T21:30:00.000000000Z', 'mid': {'o': '1.10913', 'h': '1.10918', 'l': '1.10912', 'c': '1.10913'}}]
df = pd.DataFrame()
for ll in l:
df = df.append(pd.DataFrame(ll['mid'], index=[ll['time']]))
假設您的樣本數據名為data
:
>>> pd.DataFrame([d['mid'] for d in data], index=[d['time'] for d in data])
o h l c
2020-01-17T19:15:00.000000000Z 1.10916 1.10917 1.10906 1.10912
2020-01-17T19:30:00.000000000Z 1.10914 1.10922 1.10908 1.10919
2020-01-17T19:45:00.000000000Z 1.10920 1.10946 1.10920 1.10930
2020-01-17T20:00:00.000000000Z 1.10930 1.10931 1.10919 1.10928
2020-01-17T20:15:00.000000000Z 1.10926 1.10934 1.10922 1.10926
2020-01-17T20:30:00.000000000Z 1.10926 1.10928 1.10913 1.10920
2020-01-17T20:45:00.000000000Z 1.10918 1.10929 1.10913 1.10928
2020-01-17T21:00:00.000000000Z 1.10927 1.10929 1.10920 1.10924
2020-01-17T21:15:00.000000000Z 1.10926 1.10926 1.10910 1.10912
2020-01-17T21:30:00.000000000Z 1.10913 1.10918 1.10912 1.10913
時間安排
data *= 1000 # Now list of 10k dictionaries.
%timeit df = pd.DataFrame([d['mid'] for d in data], index=[d['time'] for d in data])
# 13.4 ms ± 361 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df = pd.DataFrame(data)
df = pd.concat([df['time'], df['mid'].apply(pd.Series)], axis=1)
# 4.52 s ± 494 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
df = pd.DataFrame()
for record in data:
df = df.append(pd.DataFrame(record['mid'], index=[record['time']]))
# 21.4 s ± 2.86 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.