簡體   English   中英

重采樣熊貓數據框時的NaN值

[英]NaN values when resampling pandas dataframe

我有一個帶有兩個不同列的pandas數據框:

  • 日期時間索引列;
  • 包含字典的列

如果運行自定義的重采樣器,該結果返回一個新的dict作為結果,則在重采樣的數據框中會得到一個NaN值。

是否可以進行不返回數字的重采樣?

謝謝,FB

EDIT1:這是一個數據示例:

2017-10-15 06:55:14.237039000,"{'SMA120C': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA115_L': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA121_CT': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA110_4L': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA111': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}}"
2017-10-15 06:55:18.584042000,"{'SMA120C': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA115_L': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA121_CT': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA110_4L': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA111': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}}"
2017-10-15 06:55:22.881817000,"{'SMA120C': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA115_L': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA121_CT': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA110_4L': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA111': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}}"
2017-10-15 06:55:27.234606000,"{'SMA120C': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA115_L': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA121_CT': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA110_4L': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA111': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}}"
2017-10-15 06:55:31.593890000,
2017-10-15 06:55:35.978696000,"{'SMA120C': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA115_L': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA121_CT': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA110_4L': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA111': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}}"
2017-10-15 06:55:40.296786000,"{'SMA120C': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA115_L': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA121_CT': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA110_4L': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA111': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}}"
2017-10-15 06:55:44.655286000,"{'SMA120C': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA115_L': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA121_CT': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA110_4L': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA111': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}}"
2017-10-15 06:55:48.957150000,"{'SMA120C': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA115_L': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA121_CT': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA110_4L': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA111': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}}"
2017-10-15 06:55:53.299944000,

我只是過濾掉了第二列不包含任何基於字符串的字典的行。

EDIT2:

重采樣器功能:

def custom_resampler(array_like):
    ref_el = {}
    data = {}
    for element in filter(lambda item: item is not None, array_like):
        for machine in element.keys():
                if not ref_el.get(machine, None):
                    ref_el[machine] = element[machine].get('totalProduction', 0) if isinstance(element[machine], dict) else 0
                    data[machine] = {
                        '0': [],
                        '1': [],
                        '2':[],
                        '3':[],
                        '4':[],
                        '5':[],
                        '6': [],
                        '7':[],
                        '8':[],
                        '9':[],
                        '10':[]
                    }
                else:
                    status = str(element[machine]['status'])
                    total_prod_diff = element[machine].get('totalProduction', 0) - ref_el[machine]
                    data[machine][status].append(
                        total_prod_diff
                    )
                    ref_el[machine] = element[machine].get('totalProduction', 0)

您首先需要將strings列轉換為dictionaries

import ast
df['col'] = df['col'].fillna('{}').apply(ast.literal_eval)

然后將return to function添加到輸出聚合字典的末尾:

def custom_resampler(array_like):
    ref_el = {}
    data = {}
    for element in filter(lambda item: item is not None, array_like['fetched_data']):
        for machine in element.keys():
                if not ref_el.get(machine, None):
                    ref_el[machine] = element[machine].get('totalProduction', 0) if isinstance(element[machine], dict) else 0
                    data[machine] = {
                        '0': [],
                        '1': [],
                        '2':[],
                        '3':[],
                        '4':[],
                        '5':[],
                        '6': [],
                        '7':[],
                        '8':[],
                        '9':[],
                        '10':[]
                    }
                else:
                    status = str(element[machine]['status'])
                    total_prod_diff = element[machine].get('totalProduction', 0) - ref_el[machine]
                    data[machine][status].append(
                        total_prod_diff
                    )
                    ref_el[machine] = element[machine].get('totalProduction', 0)
    #return ouptut dict
    return [ref_el]

df1 = df.resample('T').apply(custom_resampler)
print (df1)
                                                          fetched_data
2017-10-15 06:55:00  {'SMA111': 2809, 'SMA121_CT': 2809, 'SMA110_4L...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM