简体   繁体   中英

AttributeError: 'int' object has no attribute 'dtype'

I am trying to run a script to obtain data for a number of stocks. Part of the data I am trying to obtain is a liquidity measure (called Amihud liquidity measure). I automated the script but when running the automated script, I get an error after roughly 15-20 succesful returns. How can I fix this issue?

File "script.py", line 23, in <module>
return_data = function.get_data(row[1], row[0])
File "C:\Users\leon_\function.py", line 39, in get_data
print(np.nanmean(illiq))
File "D:\Anaconda3\lib\site-packages\numpy\lib\nanfunctions.py", line 916, in nanmean
avg = _divide_by_count(tot, cnt, out=out)
File "D:\Anaconda3\lib\site-packages\numpy\lib\nanfunctions.py", line 190, in _divide_by_count
return a.dtype.type(a / b)
AttributeError: 'int' object has no attribute 'dtype'

The part of the code that handles the illiquidity measure:

  # Amihuds Liquidity measure
    liquidity_pricing_date = date_1 + datetime.timedelta(days=-20)
    liquidity_pricing_date2 = date_1 + datetime.timedelta(days=-120)
    stock_data = quandl.get(stock_ticker, start_date=liquidity_pricing_date2, end_date=liquidity_pricing_date)
    p = np.array(stock_data['Adj. Close'])
    returns = np.array(stock_data['Adj. Close'].pct_change())
    dollar_volume = np.array(stock_data['Volume'] * p)
    illiq = (np.divide(returns, dollar_volume))
    print(np.nanmean(illiq))
    illiquidity_measure = np.nanmean(illiq, dtype=float) * (10 ** 6)  # multiply by 10^6 for expositional purposes
    return [stock_vola, stock_price_average, illiquidity_measure]

Anyone has any idea on how to solve this?

EDIT: This is the script file

# Open File Dialog

root = tk.Tk()
root.withdraw()

file_path = filedialog.askopenfilename()

# Load Spreadsheet data
f = open(file_path)

csv_f = csv.reader(f)
next(csv_f)

result_data = []

# Iterate
for row in csv_f:
    return_data = function.get_data(row[1], row[0])
    if len(return_data) != 0:
        # print(return_data)
        result_data_loc = [row[1], row[0]]
        result_data_loc.extend(return_data)
        result_data.append(result_data_loc)

if result_data is not None:
    with open('resuls.csv', mode='w', newline='') as result_file:
        csv_writer = csv.writer(result_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        for result in result_data:
            # print(result)
            csv_writer.writerow(result)
else:
    print("No results found!")

[I'd place this as comment but given the length I can not] I don't feel like there's enough information for me to help you solve the issue, in your place, I would add this to make sure I understand why is the code failing and at the same time continue the process to finish it. This way you can then work on the files that failed and correct your script while still getting results.

root = tk.Tk()
root.withdraw()

file_path = filedialog.askopenfilename()

# Load Spreadsheet data
f = open(file_path)

csv_f = csv.reader(f)
next(csv_f)

result_data = []

# Iterate
for row in csv_f:
    try:
       return_data = function.get_data(row[1], row[0])
       if len(return_data) != 0:
          # print(return_data)
          result_data_loc = [row[1], row[0]]
          result_data_loc.extend(return_data)
          result_data.append(result_data_loc)
    except AttributeError:
          print(row[0])
          print('\n\n')
          print(row[1])
          continue

if result_data is not None:
    with open('resuls.csv', mode='w', newline='') as result_file:
        csv_writer = csv.writer(result_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        for result in result_data:
            # print(result)
            csv_writer.writerow(result)
else:
    print("No results found!")

So according to the traceback (thankfully we didn't have to ask for that), the error occurs in:

np.nanmean(illiq)

where it's trying to adjust the return value to match the dtype of an input, probably illiq . At this point in nanmean (looking at its code) it has summed the input (after removing nan ), tot , and counted elements cnt . It's written assuming illiq is a numeric numpy array (preferably float dtype since it has to deal with float np.nan ).

So it works most of the time, but in some cases fails. What is different about illiq in those cases?

p = np.array(stock_data['Adj. Close'])
returns = np.array(stock_data['Adj. Close'].pct_change())
dollar_volume = np.array(stock_data['Volume'] * p)
illiq = (np.divide(returns, dollar_volume))

Looks like stock_data is a dataframe , and the inputs are arrays derived from individual series . I believe stock_data[name].to_num() is the preferred way of getting an array from a Series, though np.array(...) may work most of the time. stock_data[name].values was also used.

I'd suggest applying some tests to illiq before this call. Check shape and dtype at least. Try to identify what's different in the problem case.

Here's a simple case that produces this error:

In [117]: np.nanmean(np.array([0,3],object))                                                                 
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-117-26ab42d92ec9> in <module>
----> 1 np.nanmean(np.array([0,3],object))

<__array_function__ internals> in nanmean(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/numpy/lib/nanfunctions.py in nanmean(a, axis, dtype, out, keepdims)
    949     cnt = np.sum(~mask, axis=axis, dtype=np.intp, keepdims=keepdims)
    950     tot = np.sum(arr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
--> 951     avg = _divide_by_count(tot, cnt, out=out)
    952 
    953     isbad = (cnt == 0)

/usr/local/lib/python3.6/dist-packages/numpy/lib/nanfunctions.py in _divide_by_count(a, b, out)
    216         else:
    217             if out is None:
--> 218                 return a.dtype.type(a / b)
    219             else:
    220                 # This is questionable, but currently a numpy scalar can

AttributeError: 'int' object has no attribute 'dtype'

pandas often creates object dtype Series when one or more values is not a valid number. That can include strings and None values.

The simple answer is that your data is not a numpy datatype. This is likely because the column isn't fully numeric (ie contains None or something).

The short solution:

print(np.nanmean(pd.to_numeric(illiq)))

The quickest way to solve this is to simply coerce the data to a numeric type that numpy likes. This can be done via pandas' to_numeric method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM