I am trying to split a csv file of temperature data into smaller dictionaries so I can calculate the mean temperature of each month. The csv file is of the format below:
AirTemperature AirHumidity SoilTemperature SoilMoisture LightIntensity WindSpeed Year Month Day Hour Minute Second TimeStamp MonthCategorical
12 68 19 65 60 2 2016 1 1 0 1 1 10100 January
18 34 14 42 19 0 2016 1 1 1 1 1 10101 January
19 98 14 41 30 4 2016 1 1 2 1 1 10102 January
16 88 16 68 54 4 2016 1 1 3 1 1 10103 January
16 44 20 41 10 1 2016 1 1 4 1 1 10104 January
22 54 18 65 94 0 2016 1 1 5 1 1 10105 January
18 84 17 41 40 4 2016 1 1 6 1 1 10106 January
20 88 22 92 31 0 2016 1 1 7 1 1 10107 January
23 1 22 59 3 0 2016 1 1 8 1 1 10108 January
23 3 22 72 41 4 2016 1 1 9 1 1 10109 January
24 63 23 83 85 0 2016 1 1 10 1 1 10110 January
29 73 27 50 1 4 2016 1 1 11 1 1 10111 January
28 37 30 46 29 3 2016 1 1 12 1 1 10112 January
30 99 32 78 73 4 2016 1 1 13 1 1 10113 January
32 72 31 80 80 1 2016 1 1 14 1 1 10114 January
Where there are 24 readings per day over a 6 month period.
I can get half way there with the following code:
for row in df['AirTemperature']:
for equivalentRow in df['MonthCategorical']:
if equivalentRow == "January":
JanuaryAirTemperatures.append(row)
But the output of this has every AirTemp value duplicated by the number of rows containing the value January. Ie rather than 12,18,19 etc it goes 12, 12, 12, 12, 12, 18, 18, 18, 18, 18, 19, 19, 19, 19
I tried the following:
for row in df['AirTemperature']:
if df['MonthCategorical'] == "January":
JanuaryAirTemperatures.append(row)
But I get the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I assume because it is trying to look at the whole column rather than the equivalent row.
IIUC, you can groupby by month and get the mean value of the Air Temperature per month with:
g = df.groupby('MonthCategorical')['AirTemperature'].mean().reset_index(name='MeanAirTemperature')
this returns:
MonthCategorical MeanAirTemperature
0 January 22
Then you can choose on what columns you want to groupby (ie instead of MonthCategorical
you can group by Month
only...).
EDIT: You can also use transform
to get a new column to append to the original dataframe with:
df['MeanAirTemperature'] = df.groupby('MonthCategorical')['AirTemperature'].transform('mean')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.