如何修復 TypeError：字符串索引必須是整數

Question

我在 pyspark 中從mapInPandas function 傳遞dataframe 。 所以我需要 ID 列的所有值都應該用逗號（，）分隔，比如'H57R6HU87','A1924334','496A4806'

x1['ID'] 看起來像這樣

H57R6HU87
A1924334
496A4806'

這是我獲取唯一 ID 的代碼，我收到TypeError：字符串索引必須是整數

# batch_iter= cust.toPandas()
  
for x1 in batch_iter:
   IDs= ','.join(f"'{i}'" for i in x1['ID'].unique())

Answer 1

您可能不需要循環，請嘗試：

batch_iter = cust.toPandas()
IDs = ','.join(f"'{i}'" for i in batch_iter['ID'].unique())

或者您可以嘗試僅使用 Spark 函數：

df2 = df.select(F.concat_ws(',', F.collect_set('ID')).alias('ID'))

如果你想使用mapInPandas ：

def pandas_func(iter):
    for x1 in iter:
        IDs = ','.join(f"'{i}'" for i in x1['ID'].unique())
        yield pd.DataFrame({'ID': IDs}, index=[0])

df.mapInPandas(pandas_func)
# But I suspect you want to do this instead:
# df.repartition(1).mapInPandas(pandas_func)

如何修復 TypeError：字符串索引必須是整數

問題描述

1 個解決方案

解決方案1
0 2021-03-02 12:37:57

如何修復 TypeError：字符串索引必須是整數

問題描述

1 個解決方案

解決方案1 0 2021-03-02 12:37:57

解決方案1
0 2021-03-02 12:37:57