How to get specific information from a csv file using pandas?

Question

I want to extract specific information from this csv file .

I need make a list of days with lowest visibility and give an overview of other time parameters for those days in tabular form.

I have tried to use

data = pandas.read_csv('Weather_2012.csv')
data.nsmallest(5, 'Visibility(km)')

but it returns several values for the same day. I don't know if I'm doing it correctly, since I need the list of days.

I also need the total number of foggy days. I have filtered all rows containing fog:

data.loc[data['Weather'].str.contains('Fog')]['Weather']

but I don't know how to count the number of days, I can only count the number of rows.

Answer 1

You're looking for DataFrame.resample . Based on a specific column, it will group the rows of the dataframe by a specific time interval.

First you need to do this, if you haven't already:

data['Date/Time'] = pd.to_datetime(data['Date/Time'])

Get the lowest 5 days of visibility:

>>> df.resample(rule='D', on='Date/Time')['Visibility (km)'].mean().nsmallest(5)
Date/Time
2012-03-01    2.791667
2012-03-14    5.350000
2012-12-27    6.104167
2012-01-17    6.433333
2012-02-01    6.795833
Name: Visibility (km), dtype: float64

Basically what that does is this:

Groups all the rows by day
Converts each group to the average value of all the Visibility (km) items for that day
Returns the 5 smallest

Count the number of foggy days

>>> df.resample(rule='D', on='Date/Time').apply(lambda x: x['Weather'].str.contains('Fog').any()).sum()
78

Basically what that does is this:

Groups all the rows by day
For each day, adds a True if any row inside that day contains 'Fog' in the Weather column, False otherwise
Counts how many True 's there were, and thus the number of foggy days.

Answer 2

This will get you an array of all unique foggy days. you can use the shape method to get its dimension

 df[df["Weather"].apply(lambda x : "Fog" in x)]["Date/Time"].unique()

Answer 3

I need make a list of days with lowest visibility and give an overview of other time parameters for those days in tabular form.

Since your Date/Time column represents a particular hour , you'll need to do some grouping to get the minimum visibility for a particular day . The following will find the 5 least-visible days.

# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date

# Group on the new "Date" column and get the minimum values of
# each column for each group.
>>> min_by_day = data.groupby("Date").min()

# Now we can use nsmallest, since 1 row == 1 day in min_by_day.
# Since `nsmallest` returns a pandas.Series with "Date" as the index,
# we have to use `.index` to pull the date objects from the result.
>>> least_visible_days = min_by_day.nsmallest(5, "Visibility (km)").index

Then you can limit your original dataset to the least-visible days with

data[data["Date"].isin(least_visible_days)]

I also need the total number of foggy days.

We can use the extracted date in this case too:

# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date

# Filter on hours which have foggy weather
>>> foggy = data[data["Weather"].str.contains("Fog")]

# Count number of unique days
>>> len(foggy["Date"].unique())

How to get specific information from a csv file using pandas?

Question

3 answers

solution1
1 2021-11-24 23:46:07

solution2
0 2021-11-25 00:20:25

solution3
0 2021-11-25 00:59:00

How to get specific information from a csv file using pandas?

Question

3 answers

solution1 1 2021-11-24 23:46:07

solution2 0 2021-11-25 00:20:25

solution3 0 2021-11-25 00:59:00

solution1
1 2021-11-24 23:46:07

solution2
0 2021-11-25 00:20:25

solution3
0 2021-11-25 00:59:00