![](/img/trans.png)
[英]How do I format my pandas dataframe through python to get columns in the csv output?
[英]How do I get my output which is json converted in a dataframe format?
我是 python 新手,并试图从 API 格式化我的输出:
输出数据帧为:
**data**
Out[8]: b'[{"date":"2020-01-19","stats":[{"metrics":{"blocks":5,"bounce_drops":6,"bounces":16,"clicks":278,"deferred":8,"delivered":1453,"invalid_emails":6,"opens":2502,"processed":155,"requests":1484,"spam_report_drops":0,"spam_reports":0,"unique_clicks":199,"unique_opens":1013,"unsubscribe_drops":0,"unsubscribes":0}}]}]\n'
我想以表格形式制作它,以便我可以将其导出到 csv:
我试过:
import pandas as pd
merge_HOO = {'blocks': [], 'bounce_drops': [], 'bounces': [], 'clicks': []}
for i, restaurant in enumerate(data):
for item in restaurant['metrics']:
merge_HOO['blocks'].append(i)
merge_HOO['bounce_drops'].append(item['bounce_drops'])
merge_HOO['bounces'].append(item['bounces'])
merge_HOO['clicks'].append(item['clicks'])
merge_HOO = pd.DataFrame(merge_HOO,
columns=['blocks', 'bounce_drops', 'bounces', 'clicks'])
print(merge_HOO)
Traceback (most recent call last):
File "<ipython-input-9-ad0eecd65eba>", line 4, in <module>
for item in restaurant['metrics']:
TypeError: 'int' object is not subscriptable
但我得到了上述错误。
我希望它在我的 csv 中看起来像下面这样,我有各自的标题和每个下面的统计信息:
blocks bounce_drops bounces
5 6 16
这是一种方法:
from pandas.io.json import json_normalize
# lets say d is your list containing dict
f = json_normalize(d)
# reshape the data
cols = ['blocks','bounce_drops','bounces']
df = f['stats'].apply(lambda x: pd.Series(x[0]))['metrics'].apply(pd.Series)[cols]
blocks bounce_drops bounces
0 5 6 16
样本数据
d = [{"date":"2020-01-19","stats":[{"metrics":{"blocks":5,"bounce_drops":6,"bounces":16,"clicks":278,"deferred":8,"delivered":1453,"invalid_emails":6,"opens":2502,"processed":155,"requests":1484,"spam_report_drops":0,"spam_reports":0,"unique_clicks":199,"unique_opens":1013,"unsubscribe_drops":0,"unsubscribes":0}}]}]
您错过了一个列表,在错误item
解析为stats
list
File "<ipython-input-9-ad0eecd65eba>", line 4, in <module>
for item in restaurant['metrics']:
你不需要熊猫只是为了输出 csv,只需使用csv
模块。
假设 JSON 是列表中的字典:
import csv, io
j = [{
"date": "2020-01-19",
"stats": [
{ "metrics":{
"blocks": 5,
"bounce_drops": 6,
"bounces": 16,
"clicks": 278,
"deferred": 8,
"delivered": 1453,
"invalid_emails": 6,
"opens": 2502,
"processed": 155,
"requests": 1484,
"spam_report_drops": 0,
"spam_reports": 0,
"unique_clicks": 199,
"unique_opens": 1013,
"unsubscribe_drops": 0,
"unsubscribes": 0
}
}
]
},
{
"date": "2020-01-18",
"stats": [
{ "metrics":{
"blocks": 5,
"bounce_drops": 6,
"bounces": 16,
"clicks": 278,
"deferred": 8,
"delivered": 1453,
"invalid_emails": 6,
"opens": 2502,
"processed": 155,
"requests": 1484,
"spam_report_drops": 0,
"spam_reports": 0,
"unique_clicks": 199,
"unique_opens": 1013,
"unsubscribe_drops": 0,
"unsubscribes": 0
}
}
]
},
{
"date": "2020-01-17",
"stats": [
{ "metrics":{
"blocks": 5,
"bounce_drops": 6,
"bounces": 16,
"clicks": 278,
"deferred": 8,
"delivered": 1453,
"invalid_emails": 6,
"opens": 2502,
"processed": 155,
"requests": 1484,
"spam_report_drops": 0,
"spam_reports": 0,
"unique_clicks": 199,
"unique_opens": 1013,
"unsubscribe_drops": 0,
"unsubscribes": 0
}
}
]
}]
merged = {'blocks': 0, 'bounce_drops': 0, 'bounces': 0, 'clicks': 0}
for i, d in enumerate(j):
for lst in d['stats']:
metrics = lst['metrics']
merged['blocks'] += metrics['blocks']
merged['bounce_drops'] += metrics['bounce_drops']
merged['bounces'] += metrics['bounces']
merged['clicks'] += metrics['clicks']
print(merged)
# {'blocks': 15, 'bounce_drops': 18, 'bounces': 48, 'clicks': 834}
在写入文件之前测试使用io.StringIO
result = io.StringIO(initial_value='', newline='\n')
fieldnames = list(merged.keys())
writer = csv.DictWriter(result, fieldnames=fieldnames)
writer.writeheader()
writer.writerow(merged)
print(result.getvalue())
# blocks,bounce_drops,bounces,clicks
# 15,18,48,834
如果你坚持使用熊猫
import pandas as pd
df = pd.DataFrame(merged, index=[0])
csv = df.to_csv(index=False)
print(csv)
# 'blocks,bounce_drops,bounces,clicks\n15,18,48,834\n'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.