简体   繁体   中英

Python wont take a value from a .csv file

I have this assignment question and I wrote the following code for it. But Python keeps telling me that "Reservoir" is not in the dataframe even though it is. How do I fix this? Here is a link to the.CVS file if needed. https://drive.google.com/file/d/1SZ639cUA3DdrlI_lG2Hq0vs6HiT8OAU3/view?usp=sharing

  1. Create and show a Bar Chart showing the number of wells by county:
  • Category: County
  • Y Axis: Total Clearfork Wells (Named "Reservoir" in file)

My code is below:

df = pd.read_csv('CF Around Lubbock Production Table.CSV')

By_County = df.groupby(['County/Parish']).sum().Reservoir

x = By_County.index
y = By_County.values

plt.figure(figsize=(10, 8))

plt.bar(x,y)


for i, j in zip(x,y):
    plt.text(i, j+10, int(j), ha = 'center')


plt.xlabel('County', fontsize = 20)
plt.ylabel('Total Clearfork Wells', fontsize = 20)

plt.xticks(fontsize = 12)
plt.yticks(fontsize = 15)

plt.show()

Column Reservoir appears to be of type object (values are strings in your case). So pandas won't sum columns with string values, if you aggregate on the whole dataframe, hence the column is left out.

What you can try:

By_County = df.groupby(['County/Parish'])['Reservoir'].sum()

It works on Series. But do you really want concatenated strings?

County/Parish
CROSBY (TX)     CLEAR FORKCLEAR FORKCLEAR FORKCLEAR FORKCLEAR ...
GARZA (TX)      CLEARFORKCLEARFORKCLEARFORKCLEARFORKCLEARFORKC...
HALE (TX)       CLEARFORKCLEARFORKCLEARFORKCLEARFORKCLEARFORKC...
HOCKLEY (TX)    CLEARFORKCLEAR FORKCLEARFORKCLEAR FORKCLEAR FO...
LAMB (TX)       CLEARFORKCLEARFORKCLEARFORKCLEARFORKCLEARFORKC...
Name: Reservoir, dtype: object

Are you looking for something like this?

df_grouped=data.groupby(['County/Parish','Reservoir'])['Reservoir'].count()

Output:

County/Parish  Reservoir      
CROSBY (TX)    CLEAR FORK         1837
               CLEARFORK             2
GARZA (TX)     CLEAR FORK           22
               CLEARFORK            32
HALE (TX)      CLEAR FORK            2
               CLEARFORK           441
HOCKLEY (TX)   CLEAR FORK          485
               CLEARFORK           218
               CLEARFORK, LO         1
               L. CLEARFORK          1
               LOWER CLEARFORK      26
               UPPER CLEARFORK      13
LAMB (TX)      CLEAR FORK            3
               CLEARFORK           108
               L. CLEARFORK          1
               LOWER CLEARFORK      12
LUBBOCK (TX)   CLEAR FORK          726
               CLEARFORK           300
               CLEARFORK, LO        60
               CLEARFORK, LO.        4
               L. CLEARFORK          2
               LOWER CLEARFORK       1
               UPPER CLEARFORK       9
LYNN (TX)      CLEARFORK             1
TERRY (TX)     CLEAR FORK            3
               CLEARFORK             1
               CLEARFORK, LO         2
               CLEARFORK, LO.        2
               LOWER CLEARFORK       1
Name: Reservoir, dtype: int64

Below code will allow you get the count of the specific group:

df_grouped=data.groupby(['County/Parish','Reservoir'])
    CROSBY_TX_CLEAR_FORK_count= df_grouped.get_group(('CROSBY (TX)', 'CLEAR FORK'))['Reservoir'].count()

CROSBY_TX_CLEAR_FORK_count

You can change the parameters inside get_group to get the count of your wished group.

This will plot bar graph for reservoir 'CLEAR FORK' for all County/Parish types.

CLEAR_FORK_Count={}

count=0

for cat in data['County/Parish'].unique():
    try:
        count = df_grouped.get_group((cat, 'CLEAR FORK'))['Reservoir'].count()  
    except:
        count=0
    
    CLEAR_FORK_Count[cat]=count

plt.bar(CLEAR_FORK_Count.keys(), CLEAR_FORK_Count.values())
plt.xticks(rotation=30)

Solution:

def getUniqueReservoirs(x):
    return x.nunique()

rs=data.groupby(['County/Parish','Reservoir']).agg({'Entity ID':'count',
                                                    'Reservoir':getUniqueReservoirs
                                     })
rs

Plotting the graph:

import matplotlib.pyplot as plt

rs.plot()
plt.xticks(rotation=90)
plt.show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM