简体   繁体   English

Python 将列表列表中的数据添加到 dataframe 自己的行中

[英]Python Adding data from a list of lists into its own row of a dataframe

The code I have below takes a JSON response and extracts some data that we need from it.下面的代码采用 JSON 响应并从中提取我们需要的一些数据。 The response is odd, sometimes its a dict sometimes its a list.反应很奇怪,有时它是一个字典,有时它是一个列表。 My code accounts for that.我的代码说明了这一点。 I really only need the list response.我真的只需要列表响应。 The problem is sometimes the list response is a list of lists.问题有时是列表响应是列表列表。 In the csv image below row 4 is an example of this.在第 4 行下方的 csv 图像中就是一个例子。 My code goes into the third column(C) and takes the first list of data and adds its to a dataframe.我的代码进入第三列(C)并获取第一个数据列表并将其添加到 dataframe。 Their are six values and they get their own column and then it moves to the next row.它们是六个值,它们有自己的列,然后移到下一行。 Row 2 is a example of this.第 2 行就是一个例子。 it has in column c one list.它在 c 列中有一个列表。 The problem is when i come to a row like row 4. Row 4 has multiple lists in column C.问题是当我来到像第 4 行这样的行时。第 4 行在 C 列中有多个列表。 The image shows at least two but it can be any number.图像至少显示两个,但可以是任意数字。 I need to take any dataset found in Column C and give it its own row in the new dataframe.我需要获取在 C 列中找到的任何数据集,并在新的 dataframe 中为其提供自己的行。 so the data from column c in row 2 would get one row of 6 columns in the dataframe, the data from row 4 would get at least 3 rows since it shows at least 3 datsets.因此,第 2 行中 c 列中的数据将在 dataframe 中获得一行 6 列,第 4 行中的数据将获得至少 3 行,因为它显示至少 3 个数据集。 Row 5 and six would also return 3 rows each to the dataframe and then 7 would only return 1.第 5 行和第 6 行也将各向 dataframe 返回 3 行,然后第 7 行将仅返回 1。

数据集

import json, time
from websocket import create_connection
import pandas as pd
   
# start with empty dataframe
df = pd.DataFrame()   
super_x = []
ws = create_connection("wss://ws.kraken.com/")

ws.send(json.dumps({
    "event": "subscribe",
    "pair": ["BTC/USD"],
    "subscription": {"name": "trade"}
}))

timeout = time.time() + 60*.20
while time.time() < timeout:
    js = json.loads(ws.recv())
    if isinstance(js, dict):
        df = pd.concat([df, pd.json_normalize(js)])
        #super_x.append( [super_x, pd.json_normalize(js)])
    elif isinstance(js, list):        
        df = pd.concat([df, pd.json_normalize({"event":"trade",
        #super_x.append([super_x, pd.json_normalize({"event":"trade",
                                               "trade":{                                                                                                     
                                                   "s0":js[1][0][0],
                                                   "s1":js[1][0][1], 
                                                   "s2":js[1][0][2],
                                                   "s3":js[1][0][3],                                                
                                                   "s4":js[1][0][4],
                                                   "s5":js[1][0][5],  
                                                   "pair":js[3]}
                                              })
                       ] ) 
                              
        try:
            fd = ([pd.json_normalize({"trade":{ "s":js[1] }}) ]) 
            if fd:
                print(len(fd))
        
        except:
            print("An exception occurred")


    
    else:
         f"unknown socket data {js}"
   
    
    
    #print(js)
    #time.sleep(1)
df = pd.concat(super_x, axis=0)
#data filters
df = df[df['event'] != 'systemStatus'] 
df = df[df['event'] != 'subscriptionStatus']
df = df[df['event'] != 'heartbeat']     
#column drop for csv
cols = [0,2,3,4,5,6,7] 
df.drop(df.columns[cols],axis=1,inplace=True)
df.columns =['event','price','volume', 'time', 'side', 'orderType', 'misc', 'pair']
csv_file = "kracktwo-test.csv"
df.to_csv(csv_file, index=False, encoding='utf-8')  

in my code, the elif is where I get the data i need.在我的代码中,elif 是我获取所需数据的地方。 if you look at the line "trade":{ "s0":js[1][0][0]} , There will always be data in js[1][0] I need to look for and append the data if there is any at js[1][1] , js[1][2] , ... I just dont quite understand how I would do that.如果您查看"trade":{ "s0":js[1][0][0]}行,我需要查找js[1][0]中的数据和 append 的数据,如果在js[1][1]js[1][2] ,......我只是不太明白我会怎么做。

here is an image of a csv that soewhat works correctly.. the data i want has been put into rows under their own column.这是一个 csv 的图像,它可以正常工作.. 我想要的数据已放入它们自己列下的行中。 However this example is showing only data from the first datasets of each row, if the row was really like row four from the first csv image it would only return the first one, thats the problem.但是,此示例仅显示来自每行的第一个数据集的数据,如果该行真的像第一个 csv 图像中的第四行,它只会返回第一个,这就是问题所在。

在此处输入图像描述

new image:新图片: 显示有多个列表的行

here is the raw response coming in. It shows everything that Im currently filtering out.这是进来的原始响应。它显示了我当前过滤掉的所有内容。

{'connectionID': 16068280472185995247, 'event': 'systemStatus', 'status': 'online', 'version': '1.7.2'}
{'channelID': 321, 'channelName': 'trade', 'event': 'subscriptionStatus', 'pair': 'XBT/USD', 'status': 'subscribed', 'subscription': {'name': 'trade'}}
[321, [['46720.00000', '0.00110000', '1612842883.462662', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46720.00000', '1.00000000', '1612842885.072037', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['46720.00000', '0.01500000', '1612842885.083810', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46719.90000', '0.03710320', '1612842886.195731', 's', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46720.00000', '0.00100000', '1612842886.966132', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46719.90000', '0.00718180', '1612842887.736970', 's', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46720.00000', '0.20000000', '1612842889.436244', 'b', 'l', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46720.00000', '0.00849880', '1612842889.692922', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46720.00000', '0.05000000', '1612842890.358690', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46720.00000', '0.10702055', '1612842891.977570', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46720.00000', '0.07603446', '1612842892.601437', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['46719.90000', '0.01604475', '1612842893.217442', 's', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46720.00000', '1.16008431', '1612842894.457002', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['46720.00000', '0.01500000', '1612842894.478225', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46720.00000', '0.01000000', '1612842895.156688', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46720.00000', '0.00874369', '1612842897.466145', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46720.00000', '0.32680412', '1612842900.426143', 'b', 'l', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46720.00000', '0.02000000', '1612842900.731235', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46720.00000', '0.02000000', '1612842900.818573', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46720.00000', '0.01510000', '1612842900.904646', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46720.00000', '0.00668944', '1612842901.064427', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46720.00000', '1.57551815', '1612842901.223155', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46720.00000', '0.19335759', '1612842901.465767', 'b', 'l', ''], ['46720.00000', '0.10000000', '1612842901.467930', 'b', 'l', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46725.00000', '0.00200000', '1612842901.772735', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46728.20000', '0.30000000', '1612842901.830095', 'b', 'l', ''], ['46729.60000', '0.00500000', '1612842901.832807', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46730.00000', '0.70000000', '1612842902.123385', 'b', 'l', ''], ['46730.80000', '0.00107000', '1612842902.125857', 'b', 'l', ''], ['46740.00000', '2.00000000', '1612842902.128813', 'b', 'l', ''], ['46740.70000', '0.34406831', '1612842902.131029', 'b', 'l', ''], ['46742.50000', '0.00062959', '1612842902.133150', 'b', 'l', ''], ['46744.60000', '0.20000000', '1612842902.136065', 'b', 'l', ''], ['46750.00000', '0.01851050', '1612842902.138491', 'b', 'l', ''], ['46750.00000', '0.03423252', '1612842902.141181', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46729.90000', '0.00100000', '1612842902.149428', 's', 'm', '']], 'trade', 'XBT/USD']
[321, [['46750.00000', '0.14725698', '1612842902.153561', 'b', 'm', ''], ['46750.00000', '0.10274302', '1612842902.154768', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['46750.00000', '0.50000000', '1612842902.158276', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['46750.00000', '0.05000000', '1612842902.162690', 'b', 'm', ''], ['46750.00000', '0.10000000', '1612842902.166186', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46750.00000', '0.10695187', '1612842903.077553', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['46750.00000', '0.49430511', '1612842903.099799', 'b', 'l', ''], ['46756.00000', '0.00110002', '1612842903.102014', 'b', 'l', ''], ['46756.60000', '0.00079851', '1612842903.103715', 'b', 'l', ''], ['46763.70000', '0.00043351', '1612842903.105738', 'b', 'l', ''], ['46766.10000', '0.15000000', '1612842903.107645', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46774.20000', '0.04480930', '1612842903.128691', 'b', 'm', ''], ['46774.20000', '0.08000000', '1612842903.131908', 'b', 'm', ''], ['46774.20000', '0.15015145', '1612842903.134977', 'b', 'm', ''], ['46774.20000', '0.02503925', '1612842903.138306', 'b', 'm', ''], ['46787.80000', '0.10000000', '1612842903.139867', 'b', 'm', ''], ['46787.80000', '0.03445583', '1612842903.141510', 'b', 'm', ''], ['46787.90000', '0.01000000', '1612842903.143097', 'b', 'm', ''], ['46789.30000', '0.03050492', '1612842903.145436', 'b', 'm', ''], ['46789.30000', '0.02503925', '1612842903.149362', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['46766.30000', '0.01163594', '1612842903.171495', 's', 'm', ''], ['46766.30000', '0.00336406', '1612842903.177694', 's', 'm', '']], 'trade', 'XBT/USD']
[321, [['46766.30000', '0.00044563', '1612842903.183960', 's', 'm', ''], ['46766.30000', '0.00000116', '1612842903.187847', 's', 'm', '']], 'trade', 'XBT/USD']
[321, [['46789.30000', '0.04445583', '1612842903.192119', 'b', 'm', ''], ['46789.30000', '0.04554417', '1612842903.194391', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['46789.30000', '0.01000000', '1612842903.198400', 'b', 'm', ''], ['46789.30000', '0.02500000', '1612842903.201485', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['46766.70000', '0.00079858', '1612842903.223055', 's', 'l', '']], 'trade', 'XBT/USD']
[321, [['46766.90000', '0.00079233', '1612842903.258566', 's', 'l', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46771.00000', '0.00100000', '1612842904.708929', 's', 'm', '']], 'trade', 'XBT/USD']
[321, [['46771.10000', '0.01000000', '1612842904.753285', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46760.90000', '0.00042129', '1612842906.035380', 'b', 'l', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46748.80000', '0.00200000', '1612842907.720085', 's', 'm', '']], 'trade', 'XBT/USD']
[321, [['46746.60000', '0.11226863', '1612842908.307851', 's', 'm', ''], ['46745.30000', '0.42381782', '1612842908.310121', 's', 'm', ''], ['46745.30000', '0.00101472', '1612842908.313349', 's', 'm', ''], ['46745.30000', '0.00000243', '1612842908.315922', 's', 'm', ''], ['46745.30000', '0.00000001', '1612842908.318515', 's', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46745.40000', '0.05952963', '1612842911.490887', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46745.40000', '0.14047037', '1612842915.495179', 'b', 'm', ''], ['46745.40000', '0.02460000', '1612842915.497478', 'b', 'm', ''], ['46745.50000', '0.02460000', '1612842915.499842', 'b', 'm', ''], ['46747.50000', '0.14000000', '1612842915.501424', 'b', 'm', ''], ['46762.60000', '0.08000000', '1612842915.503684', 'b', 'm', ''], ['46764.40000', '0.09032963', '1612842915.505680', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46745.30000', '0.00350000', '1612842916.786464', 's', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46745.40000', '0.00060000', '1612842918.255090', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46745.40000', '0.12000000', '1612842921.591952', 'b', 'm', ''], ['46745.40000', '0.02460000', '1612842921.594207', 'b', 'm', ''], ['46745.40000', '0.04320000', '1612842921.595638', 'b', 'm', ''], ['46745.40000', '0.08000000', '1612842921.596980', 'b', 'm', ''], ['46745.40000', '0.13436069', '1612842921.598368', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46745.40000', '0.00695149', '1612842923.109038', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46745.40000', '0.04706335', '1612842924.637917', 'b', 'l', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46745.30000', '0.04019430', '1612842926.694212', 's', 'l', ''], ['46744.30000', '0.00164674', '1612842926.696474', 's', 'l', '']], 'trade', 'XBT/USD']
[321, [['46723.00000', '0.04457400', '1612842927.166026', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['46723.00000', '0.06542600', '1612842927.503626', 'b', 'm', ''], ['46723.80000', '0.17967021', '1612842927.506574', 'b', 'm', ''], ['46723.90000', '0.17123573', '1612842927.508561', 'b', 'm', ''], ['46724.30000', '0.16162447', '1612842927.510756', 'b', 'm', ''], ['46726.60000', '0.32000000', '1612842927.512988', 'b', 'm', ''], ['46727.20000', '0.20000000', '1612842927.515160', 'b', 'm', ''], ['46729.30000', '0.06510000', '1612842927.517510', 'b', 'm', ''], ['46729.40000', '0.06510000', '1612842927.519568', 'b', 'm', ''], ['46730.00000', '0.08000000', '1612842927.521465', 'b', 'm', ''], ['46738.60000', '0.14184359', '1612842927.523459', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46724.40000', '0.10000000', '1612842928.216970', 'b', 'l', ''], ['46726.50000', '0.06510000', '1612842928.219388', 'b', 'l', ''], ['46729.20000', '0.10000000', '1612842928.222759', 'b', 'l', ''], ['46738.00000', '0.23490000', '1612842928.224467', 'b', 'l', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46721.70000', '0.50363900', '1612842932.573493', 's', 'm', ''], ['46721.70000', '0.42600000', '1612842932.575444', 's', 'm', ''], ['46721.70000', '0.29516000', '1612842932.577342', 's', 'm', ''], ['46721.70000', '0.11232674', '1612842932.578843', 's', 'm', ''], ['46719.90000', '0.03000000', '1612842932.580577', 's', 'm', ''], ['46719.90000', '0.01000000', '1612842932.582029', 's', 'm', ''], ['46719.90000', '0.42500000', '1612842932.584071', 's', 'm', ''], ['46713.10000', '0.10000000', '1612842932.585913', 's', 'm', ''], ['46713.10000', '0.09787426', '1612842932.587333', 's', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['46718.00000', '0.00562852', '1612842933.770955', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['46717.90000', '0.06849619', '1612842941.402657', 's', 'l', '']], 'trade', 'XBT/USD']
[321, [['46717.90000', '0.06849619', '1612842941.426547', 's', 'l', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}

step1.步骤1。 get the data list获取数据列表

import json, time
from websocket import create_connection
import pandas as pd
   
super_x = []
ws = create_connection("wss://ws.kraken.com/")
ws.send(json.dumps({
    "event": "subscribe",
    "pair": ["BTC/USD"],
    "subscription": {"name": "trade"}
}))
timeout = time.time() + 60*.20

# only keep list type
while time.time() < timeout:
    js = json.loads(ws.recv())
    if isinstance(js, list):
        print(js)
        super_x.append(js)

step2.第2步。 handle the data.处理数据。

# parse the data
df = pd.DataFrame(super_x, columns=['channelID', 'trade', 'event', 'pair']).explode('trade')
df[['price', 'volume', 'time', 'side', 'orderType', 'misc']] = pd.DataFrame(df['trade'].tolist()).values
cols = ['event', 'price', 'volume', 'time', 'side', 'orderType', 'misc', 'pair']
dfn = df[cols].copy()

print(dfn)

       event    price      volume               time side orderType misc     pair
    0  trade  46737.2  0.03499059  1612848385.323798    s         m       XBT/USD
    0  trade  46737.2  0.01500941  1612848385.328784    s         m       XBT/USD
    1  trade  46736.8  0.06296629  1612848388.057267    s         m       XBT/USD
    1  trade  46736.8  0.01000000  1612848388.060013    s         m       XBT/USD
    1  trade  46736.8  0.00003371  1612848388.061986    s         m       XBT/USD
    1  trade  46731.3  0.02404310  1612848388.063164    s         m       XBT/USD
    2  trade  46732.6  0.03170000  1612848390.196840    s         l       XBT/USD
    3  trade  46734.7  0.10000000  1612848392.086250    s         m       XBT/USD
    4  trade  46735.9  0.00425878  1612848394.057669    s         m       XBT/USD

You can iterate over the elements of the JSON list in a dictionary comprehension.您可以在字典理解中迭代 JSON 列表的元素。

trade = {f"s{i}": val for i, val in enumerate(js[1][0])}
trade["pair"] = js[3]
df = pd.concat([df, pd.json_normalize({"event": "trade", "trade": trade)] )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM