I have this assignment question and I wrote the following code for it. But Python keeps telling me that "Reservoir" is not in the dataframe even though it is. How do I fix this? Here is a link to the.CVS file if needed. https://drive.google.com/file/d/1SZ639cUA3DdrlI_lG2Hq0vs6HiT8OAU3/view?usp=sharing
My code is below:
df = pd.read_csv('CF Around Lubbock Production Table.CSV')
By_County = df.groupby(['County/Parish']).sum().Reservoir
x = By_County.index
y = By_County.values
plt.figure(figsize=(10, 8))
plt.bar(x,y)
for i, j in zip(x,y):
plt.text(i, j+10, int(j), ha = 'center')
plt.xlabel('County', fontsize = 20)
plt.ylabel('Total Clearfork Wells', fontsize = 20)
plt.xticks(fontsize = 12)
plt.yticks(fontsize = 15)
plt.show()
Column Reservoir
appears to be of type object
(values are strings in your case). So pandas won't sum columns with string values, if you aggregate on the whole dataframe, hence the column is left out.
What you can try:
By_County = df.groupby(['County/Parish'])['Reservoir'].sum()
It works on Series. But do you really want concatenated strings?
County/Parish
CROSBY (TX) CLEAR FORKCLEAR FORKCLEAR FORKCLEAR FORKCLEAR ...
GARZA (TX) CLEARFORKCLEARFORKCLEARFORKCLEARFORKCLEARFORKC...
HALE (TX) CLEARFORKCLEARFORKCLEARFORKCLEARFORKCLEARFORKC...
HOCKLEY (TX) CLEARFORKCLEAR FORKCLEARFORKCLEAR FORKCLEAR FO...
LAMB (TX) CLEARFORKCLEARFORKCLEARFORKCLEARFORKCLEARFORKC...
Name: Reservoir, dtype: object
Are you looking for something like this?
df_grouped=data.groupby(['County/Parish','Reservoir'])['Reservoir'].count()
Output:
County/Parish Reservoir
CROSBY (TX) CLEAR FORK 1837
CLEARFORK 2
GARZA (TX) CLEAR FORK 22
CLEARFORK 32
HALE (TX) CLEAR FORK 2
CLEARFORK 441
HOCKLEY (TX) CLEAR FORK 485
CLEARFORK 218
CLEARFORK, LO 1
L. CLEARFORK 1
LOWER CLEARFORK 26
UPPER CLEARFORK 13
LAMB (TX) CLEAR FORK 3
CLEARFORK 108
L. CLEARFORK 1
LOWER CLEARFORK 12
LUBBOCK (TX) CLEAR FORK 726
CLEARFORK 300
CLEARFORK, LO 60
CLEARFORK, LO. 4
L. CLEARFORK 2
LOWER CLEARFORK 1
UPPER CLEARFORK 9
LYNN (TX) CLEARFORK 1
TERRY (TX) CLEAR FORK 3
CLEARFORK 1
CLEARFORK, LO 2
CLEARFORK, LO. 2
LOWER CLEARFORK 1
Name: Reservoir, dtype: int64
Below code will allow you get the count of the specific group:
df_grouped=data.groupby(['County/Parish','Reservoir'])
CROSBY_TX_CLEAR_FORK_count= df_grouped.get_group(('CROSBY (TX)', 'CLEAR FORK'))['Reservoir'].count()
CROSBY_TX_CLEAR_FORK_count
You can change the parameters inside get_group to get the count of your wished group.
This will plot bar graph for reservoir 'CLEAR FORK' for all County/Parish types.
CLEAR_FORK_Count={}
count=0
for cat in data['County/Parish'].unique():
try:
count = df_grouped.get_group((cat, 'CLEAR FORK'))['Reservoir'].count()
except:
count=0
CLEAR_FORK_Count[cat]=count
plt.bar(CLEAR_FORK_Count.keys(), CLEAR_FORK_Count.values())
plt.xticks(rotation=30)
Solution:
def getUniqueReservoirs(x):
return x.nunique()
rs=data.groupby(['County/Parish','Reservoir']).agg({'Entity ID':'count',
'Reservoir':getUniqueReservoirs
})
rs
Plotting the graph:
import matplotlib.pyplot as plt
rs.plot()
plt.xticks(rotation=90)
plt.show()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.