简体   繁体   中英

Apply row logic on date while extracting only multiple columns of a dataframe

I am extracting a data frame in pandas and want to only extract rows where the date is after a variable.

I can do this in multiple steps but would like to know if it is possible to apply all logic in one call for best practice.

Here is my code

        import pandas as pd


        self.min_date = "2020-05-01"

        #Extract DF from URL
        self.df = pd.read_html("https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]

        #Here is where the error lies, I want to extract the columns ["Subject","Reference","Date of case"] but where the date is after min_date.
        self.df = self.df.loc[["Date of case" < self.min_date], ["Subject","Reference","Date of case"]]

        return(self.df)

I keep getting the error: "IndexError: Boolean index has wrong length: 1 instead of 100"

I cannot find the solution online because every answer is too specific to the scenario of the person that asked the question.

eg this solution only works for if you are calling one column: How to select rows from a DataFrame based on column values?

I appreciate any help.

Replace this:

["Date of case" < self.min_date]

with this:

self.df["Date of case"] < self.min_date

That is:

self.df = self.df.loc[self.df["Date of case"] < self.min_date, 
                      ["Subject","Reference","Date of case"]]

You have a slight syntax issue. Keep in mind that it's best practice to convert string dates into pandas datetime objects using pd.to_datetime.

min_date = pd.to_datetime("2020-05-01")

#Extract DF from URL
df = pd.read_html("https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]

#Here is where the error lies, I want to extract the columns ["Subject","Reference","Date of case"] but where the date is after min_date.
df['Date of case'] = pd.to_datetime(df['Date of case'])
df = df.loc[df["Date of case"] > min_date, ["Subject","Reference","Date of case"]]

Output:

                                             Subject  Reference Date of case
0  Salmonella enterica ser. Enteritidis (presence...  2020.2145   2020-05-22
1  migration of primary aromatic amines (0.4737 m...  2020.2131   2020-05-22
2  celery undeclared on green juice drink from Ge...  2020.2118   2020-05-22
3  aflatoxins (B1 = 29.4 µg/kg - ppb) in shelled ...  2020.2146   2020-05-22
4  too high content of E 200 - sorbic acid (1772 ...  2020.2125   2020-05-22

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM