简体   繁体   中英

AttributeError: 'str' object has no attribute 'name' PySpark

I have created list and trying to assign it to StructType() but getting error:

AttributeError: 'str' object has no attribute 'name'

My code:

from pyspark.sql import SparkSession
import logging
from pyspark.sql.types import *
from pyspark.sql.functions import to_timestamp
from pyspark.sql.functions import udf
from pyspark.sql.functions import lit
from pyspark.sql.functions import year, month, dayofmonth
from pyspark.context import SparkContext
from pyspark.sql import SQLContext
import argparse

logging.basicConfig(level=logging.INFO,filename = 'parquet.log')
logger = logging.getLogger(__name__)

parser = argparse.ArgumentParser()
parser.add_argument('--schema_py', '--list', nargs='+', required=True, dest='schema_py', help='Scheam def')

args = parser.parse_args()

schemaField = args.schema_py
print(type(schemaField))   #It will print <class 'list'>

schema = StructType(schemaField) # On this line facing issue
print(type(schema))

Output

$ python tst.py --schema_py 'StructField('col1', StringType(), True),StructField('col2', StringType(), True),StructField('col3', StringType(), True),StructField('col4', StringType(), True),'

<class 'list'>
Traceback (most recent call last):
  File "brrConvertParquet.py", line 41, in <module>
    schema = StructType(schemaField)
  File "/home/sysbrrd/anaconda3/lib/python3.6/site-packages/pyspark/sql/types.py", line 484, in __init__
    self.names = [f.name for f in fields]
  File "/home/sysbrrd/anaconda3/lib/python3.6/site-packages/pyspark/sql/types.py", line 484, in <listcomp>
    self.names = [f.name for f in fields]
AttributeError: 'str' object has no attribute 'name'

Please help me to understand what's going wrong here.

The problems i see are:

  1. You are passing a str into the StructType() call, rather than a list of [StructField(),] or since you have nargs='+' maybe you are passing in a list of strings. ie ["StructField('col1', StringType(), True)", "StructField('col2', StringType(), True)", "StructField('col3', StringType(), True)", "StructField('col4', StringType(), True)"] .
  2. If you really want to receive the fields as a cmd arg, then you should look into validating this arg and converting it into the desired python type. You can look into json , pickle , eval or exec .

Asides that, everything else should work.

self.names = [f.name for f in fields] breaks because fields is a str rather than a list of StructField , if it were a list of StructField as expected, the f.name call should work just fine:-)

I hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM