簡體   English   中英

使用 python 從 pipe 分隔的平面文件創建嵌套的 json 行

[英]Create nested json lines from pipe delimited flat file using python

我有一個文本文件 pipe,分隔如下。 在該文件中,對於相同的 ID、CODE 和 NUM 組合,我們可以有不同的 INC 和 INC_DESC

ID|CODE|NUM|INC|INC_DESC
"F1"|"W1"|1|1001|"INC1001"
"F1"|"W1"|1|1002|"INC1002"
"F1"|"W1"|1|1003|"INC1003"
"F2"|"W1"|1|1002|"INC1003"
"F2"|"W1"|1|1003|"INC1004"
"F2"|"W2"|1|1003|"INC1003"

我們想像下面這樣創建 json,其中不同的 INC 和 INC_DESC 應該作為 ID、CODE 和 NUM 的相同組合的數組出現

{"ID":"F1","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1001, "INC_DESC":"INC1001"},{"INC":1002, "INC_DESC":"INC1002"},{"INC":1003, "INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1002, "INC_DESC":"INC1002"},{"INC":1003, "INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W2","NUM":1,"INC_DTL":[{"INC":1003, "INC_DESC":"INC1003"}]}

我在下面嘗試過,但它沒有按照我的意願生成嵌套

import pandas as pd

Input_File=f'V:\input.dat'
df=pd.read_csv(Input_File, sep='|')

json_output=f'V:\outfile.json'
output=df.to_json(json_output, orient='records')
import pandas as pd


# agg function
def agg_that(x):
    l = [x]
    return l


Input_File = f'V:\input.dat'
df = pd.read_csv(Input_File, sep='|')

# groupby columns
df = df.groupby(['ID', 'CODE', 'NUM']).agg(agg_that).reset_index()
# create new column
df['INC_DTL'] = df.apply(
    lambda x: [{'INC': inc, 'INC_DESC': dsc} for inc, dsc in zip(x['INC'][0], x['INC_DESC'][0])], axis=1)
# drop old columns
df.drop(['INC', 'INC_DESC'], axis=1, inplace=True)

json_output = f'V:\outfile.json'
output = df.to_json(json_output, orient='records', lines=True)

OUTPUT:

{"ID":"F1","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1001,"INC_DESC":"INC1001"},{"INC":1002,"INC_DESC":"INC1002"},{"INC":1003,"INC_DESC":"INC1003"}]}
{"ID":"F1","CODE":"W2","NUM":1,"INC_DTL":[{"INC":1003,"INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1002,"INC_DESC":"INC1003"},{"INC":1003,"INC_DESC":"INC1004"}]}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM