简体   繁体   中英

Appending DataFrame to List in Pandas, Python

I have aa file of data and want to select a specific State. From there I need to return this in a list, but there will be years that correspond to the date with missing data, so I need to replace the missing data.

I am having some issue with my code, likely something is slightly off in my for loop:

def stateCountAsList(filepath,state):
    import pandas as pd 
    pd.set_option('display.width',200)

    import numpy as np 

    dataFrame = pd.read_csv(filepath,header=0,sep='\t')
    df = dataFrame.iloc[0:638,:]

    dfState = df[df['State'] == state]
    yearList = range(1999,2012)
    countsList = []

    for dfState['Year'] in yearList: 
        countsList = dfState['Count']
    else: 
        countsList.append(np.nan)
    return countsList
    print countsList.tolist() 


stateCountAsList(filepath, state)
state = 'California'

Traceback:

C:\Users\Michael\workspace\UCIIntrotoPythonDA\src\Michael_Madani_week3.py:59: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  for dfState['Year'] in yearList:
Traceback (most recent call last):
  File "C:\Users\Michael\workspace\UCIIntrotoPythonDA\src\Michael_Madani_week3.py", line 67, in <module>
    stateCountAsList(filepath, state)
  File "C:\Users\Michael\workspace\UCIIntrotoPythonDA\src\Michael_Madani_week3.py", line 62, in stateCountAsList
    countsList.append(np.nan)
  File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\core\series.py", line 1466, in append
    verify_integrity=verify_integrity)
  File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\tools\merge.py", line 754, in concat
    copy=copy)
  File "C:\Users\Michael\Anaconda\lib\site-packages\pandas\tools\merge.py", line 805, in __init__
    raise TypeError("cannot concatenate a non-NDFrame object")
TypeError: cannot concatenate a non-NDFrame object

You have at least two different issues in your code:

The warning

A value is trying to be set on a copy of a slice from a DataFrame. 

is triggered by for dfState['Year'] in yearList (line 59 in your code). In this line you try to loop over a range of years (1999 to 2012), but instead you implicitely try to assign the year value to dfState['Year']. This is not a copy, but a "view" ( http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy ), since df = dataFrame.iloc[0:638,:] returns a view.

But as mentioned earlier, you don't want to assign a value to the DataFrame here, only loop over years. So the for-loop should look like:

for year in range(1999,2012):
    ...

The second issue is in line 62. Here, you try to append np.nan to your "list" countsList - but countsList is not a list anymore, but a DataFrame!

Two lines before, you assign a pd.Series ( countsList = dfState['Count'] ), effectively changing the type. This gives you the TypeError: cannot concatenate a non-NDFrame object

With this information you should be able to correct your loop.

As an alternative, you can get the desired result using Pandas query method ( http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-query-method-experimental ):

def stateCountAsList(filepath,state):
    import pandas as pd 
    import numpy as np 

    dataFrame = pd.read_csv(filepath,header=0,sep='\t')
    df = dataFrame.iloc[0:638,:]

    stateList = df.query("(State == @state) & (Year > 1999 < 2005").Count.tolist()

    return stateList

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM