简体   繁体   English

如何使用文本文件中的列名在数据块中创建增量表的模式

[英]how to create schema of a delta table in databricks by using column names from text file

I have a text file which stores all the column names for ex: text file contains following data我有一个文本文件,其中存储所有列名,例如:文本文件包含以下数据

table1=['column1','2_column2','3_column3']
table2=['column4','5_column5','6_column6']

I need to fetch all the values and create a schema for tables mentioned in the text file.我需要获取所有值并为文本文件中提到的表创建模式。 and also some of the the columns starts with number as well as mentioned above.还有一些列以数字开头,如上所述。 output needed:需要 output:

table1 = StructType([
        StructField("column1", StringType(), True),
        StructField("2_column2", StringType(), True),
        StructField("3_column3", StringType(), True)
    ]

table2 = StructType([
        StructField("column4", StringType(), True),
        StructField("5_column5", StringType(), True),
        StructField("6_column6", StringType(), True)
    ]

all the columns will be string type .所有列都是字符串类型

how to achieve this using python/pyspark?如何使用 python/pyspark 实现这一点?

table1 = ['column1', '2_column2', '3_column3']
data = []
for i in table1:
    data.append(f'StructField({i}, StringType(), True)')
table1 = f'StructType({data})'
print(table1)
>>> StructType(['StructField(column1, StringType(), True)', 'StructField(2_column2, StringType(), True)', 'StructField(3_column3, StringType(), True)'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM