I want to extract specific information from this csv file .
I need make a list of days with lowest visibility and give an overview of other time parameters for those days in tabular form.
I have tried to use
data = pandas.read_csv('Weather_2012.csv')
data.nsmallest(5, 'Visibility(km)')
but it returns several values for the same day. I don't know if I'm doing it correctly, since I need the list of days.
I also need the total number of foggy days. I have filtered all rows containing fog:
data.loc[data['Weather'].str.contains('Fog')]['Weather']
but I don't know how to count the number of days, I can only count the number of rows.
You're looking for DataFrame.resample
. Based on a specific column, it will group the rows of the dataframe by a specific time interval.
First you need to do this, if you haven't already:
data['Date/Time'] = pd.to_datetime(data['Date/Time'])
Get the lowest 5 days of visibility:
>>> df.resample(rule='D', on='Date/Time')['Visibility (km)'].mean().nsmallest(5)
Date/Time
2012-03-01 2.791667
2012-03-14 5.350000
2012-12-27 6.104167
2012-01-17 6.433333
2012-02-01 6.795833
Name: Visibility (km), dtype: float64
Basically what that does is this:
Visibility (km)
items for that dayCount the number of foggy days
>>> df.resample(rule='D', on='Date/Time').apply(lambda x: x['Weather'].str.contains('Fog').any()).sum()
78
Basically what that does is this:
True
if any row inside that day contains 'Fog'
in the Weather
column, False
otherwiseTrue
's there were, and thus the number of foggy days.This will get you an array of all unique foggy days. you can use the shape method to get its dimension
df[df["Weather"].apply(lambda x : "Fog" in x)]["Date/Time"].unique()
I need make a list of days with lowest visibility and give an overview of other time parameters for those days in tabular form.
Since your Date/Time
column represents a particular hour , you'll need to do some grouping to get the minimum visibility for a particular day . The following will find the 5 least-visible days.
# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date
# Group on the new "Date" column and get the minimum values of
# each column for each group.
>>> min_by_day = data.groupby("Date").min()
# Now we can use nsmallest, since 1 row == 1 day in min_by_day.
# Since `nsmallest` returns a pandas.Series with "Date" as the index,
# we have to use `.index` to pull the date objects from the result.
>>> least_visible_days = min_by_day.nsmallest(5, "Visibility (km)").index
Then you can limit your original dataset to the least-visible days with
data[data["Date"].isin(least_visible_days)]
I also need the total number of foggy days.
We can use the extracted date in this case too:
# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date
# Filter on hours which have foggy weather
>>> foggy = data[data["Weather"].str.contains("Fog")]
# Count number of unique days
>>> len(foggy["Date"].unique())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.