简体   繁体   English

如何使用 pandas 从 csv 文件中获取特定信息?

[英]How to get specific information from a csv file using pandas?

I want to extract specific information from this csv file .我想从这个 csv 文件中提取特定信息。

I need make a list of days with lowest visibility and give an overview of other time parameters for those days in tabular form.我需要列出能见度最低的日子,并以表格形式概述这些日子的其他时间参数。

I have tried to use我试过用

data = pandas.read_csv('Weather_2012.csv')
data.nsmallest(5, 'Visibility(km)')

but it returns several values for the same day.但它在同一天返回多个值。 I don't know if I'm doing it correctly, since I need the list of days.我不知道我做得是否正确,因为我需要天数。

I also need the total number of foggy days.我还需要雾天的总数。 I have filtered all rows containing fog:我过滤了所有包含雾的行:

data.loc[data['Weather'].str.contains('Fog')]['Weather']

but I don't know how to count the number of days, I can only count the number of rows.但我不知道如何计算天数,我只能计算行数。

You're looking for DataFrame.resample .您正在寻找DataFrame.resample Based on a specific column, it will group the rows of the dataframe by a specific time interval.根据特定的列,它会按特定的时间间隔对 dataframe 的行进行分组。

First you need to do this, if you haven't already:首先你需要这样做,如果你还没有:

data['Date/Time'] = pd.to_datetime(data['Date/Time'])

Get the lowest 5 days of visibility:获得最低 5 天的可见度:

>>> df.resample(rule='D', on='Date/Time')['Visibility (km)'].mean().nsmallest(5)
Date/Time
2012-03-01    2.791667
2012-03-14    5.350000
2012-12-27    6.104167
2012-01-17    6.433333
2012-02-01    6.795833
Name: Visibility (km), dtype: float64

Basically what that does is this:基本上这样做是这样的:

  1. Groups all the rows by day按天对所有行进行分组
  2. Converts each group to the average value of all the Visibility (km) items for that day将每个组转换为当天所有Visibility (km)项目的平均值
  3. Returns the 5 smallest返回 5 个最小的

Count the number of foggy days数一数雾天

>>> df.resample(rule='D', on='Date/Time').apply(lambda x: x['Weather'].str.contains('Fog').any()).sum()
78

Basically what that does is this:基本上这样做是这样的:

  1. Groups all the rows by day按天对所有行进行分组
  2. For each day, adds a True if any row inside that day contains 'Fog' in the Weather column, False otherwise对于每一天,如果当天的任何行在Weather列中包含'Fog' ,则添加True ,否则添加False
  3. Counts how many True 's there were, and thus the number of foggy days.计算有多少True ,因此有雾天数。

This will get you an array of all unique foggy days.这将为您提供一系列独特的雾天。 you can use the shape method to get its dimension你可以使用 shape 方法来获取它的尺寸

 df[df["Weather"].apply(lambda x : "Fog" in x)]["Date/Time"].unique()

I need make a list of days with lowest visibility and give an overview of other time parameters for those days in tabular form.我需要列出能见度最低的日子,并以表格形式概述这些日子的其他时间参数。

Since your Date/Time column represents a particular hour , you'll need to do some grouping to get the minimum visibility for a particular day .由于您的Date/Time列代表一个特定的小时,您需要进行一些分组以获得特定日期的最低可见性。 The following will find the 5 least-visible days.下面将找到最不可见的 5 天。

# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date

# Group on the new "Date" column and get the minimum values of
# each column for each group.
>>> min_by_day = data.groupby("Date").min()

# Now we can use nsmallest, since 1 row == 1 day in min_by_day.
# Since `nsmallest` returns a pandas.Series with "Date" as the index,
# we have to use `.index` to pull the date objects from the result.
>>> least_visible_days = min_by_day.nsmallest(5, "Visibility (km)").index

Then you can limit your original dataset to the least-visible days with然后,您可以将原始数据集限制为最不可见的日子

data[data["Date"].isin(least_visible_days)]

I also need the total number of foggy days.我还需要雾天的总数。

We can use the extracted date in this case too:在这种情况下,我们也可以使用提取的日期:

# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date

# Filter on hours which have foggy weather
>>> foggy = data[data["Weather"].str.contains("Fog")]

# Count number of unique days
>>> len(foggy["Date"].unique())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM