熊猫DataFrame到BigQuery-数据列缺失

Question

我正在使用Python对我的数据进行一些RFM分析，并改编了Joal Correia的github代码（如下），该代码将结果输出到CSV，还将结果添加到数据框，然后将其发布到BigQuery表。

它可以工作，但是我在BigQuery中缺少数据的第一列“客户” ID，这是我的结果中唯一的字符串。 该列在.csv中，在python的数据框中，但不在BQ结果中，有人可以告诉我我在哪里丢失了它吗？

注意：我删除了大多数RFM代码，以使这篇文章变得混乱，下面的行显示了我的添加内容。

更新：我运行了print（results.keys（）），但在此列表中没有看到“客户”，这与它在导出中不可见有关吗？

Index(['recency', 'frequency', 'monetary_value', 'R_Quartile', 'F_Quartile',
       'M_Quartile', 'RFMClass'],
      dtype='object')

https://github.com/joaolcorreia/RFM-analysis

import sys, getopt
import pandas as pd
from datetime import datetime
from google.cloud import bigquery

.....

       rfmSegmentation['RFMClass'] = rfmSegmentation.R_Quartile.map(str) + rfmSegmentation.F_Quartile.map(str) + rfmSegmentation.M_Quartile.map(str)

# Output the results as a CSV
   rfmSegmentation.to_csv(outputfile, sep=',')

# Once the CSV is generated we also drop the results into a DataFrame and output to BigQuery.

   results = pd.DataFrame(rfmSegmentation)
   print(results.head())
   destination_table = 'xxx.RFM'
   project_id = 'xxx'
   results.to_gbq(destination_table, project_id, chunksize=10000, verbose=True, reauth=False, if_exists='replace', private_key='xxx.json')


   print (" ")
   print (" DONE! Check %s" % (outputfile))
   print (" ")

这是我的脚本在CSV中的结果，在大列中不存在“ customer”的情况下以及下面的“ customer”：

Answer 1

在进一步阅读DataFrames之后，我能够解决我的问题，事实证明，我的“客户”列是DataFrame中的索引。 我使用reset_index将其替换为值范围，然后根据需要将“客户”列中的数据导出到BigQuery。

熊猫DataFrame到BigQuery-数据列缺失

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-02-16 12:19:02

熊猫DataFrame到BigQuery-数据列缺失

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-02-16 12:19:02

解决方案1
0 已采纳 2018-02-16 12:19:02