用熊猫删除空的数据框

Question

I have written the following code to use regex to request pages, and look for strings that resemble interest rates. 我编写了以下代码，以使用正则表达式请求页面，并查找类似于利率的字符串。 The overall code works; 整个代码有效； however, it is creating multiple empty dataframes and I can't get the code to drop the empty frames to clean up my output. 但是， 它正在创建多个空数据帧，而我无法获得删除空帧以清理输出的代码。 I have been trying to use .dropna, .drop, and .empty to try and deprecate the dataframes but the output remains unchanged and keeps printing the empty dataframes with the information I have already. 我一直在尝试使用.dropna，.drop和.empty来尝试弃用数据框，但是输出保持不变，并使用我已有的信息继续打印空的数据框。 Is there an method I am not aware of that could get rid of these empty frames. 有没有一种我不知道的方法可以摆脱这些空框架。 Code and output below: 代码和输出如下：

plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
                'https://www.marcus.com/us/en/personal-loans',
                'https://www.discover.com/personal-loans/']

#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
    cdate = datetime.date.today()
    l = r.get(link)
    l.encoding = 'utf-8'
    data = l.text
    soup = bs(data, 'html.parser')
    paragraph = soup.find_all(text=re.compile('[0-9]%'))
    for n in paragraph:
        matches = []
        matches.extend(re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string))
        sint = pd.Series(matches)
        qdate = pd.Series([datetime.datetime.now()]*len(sint))
        slink = pd.Series([link]*len(sint))
        df = pd.concat([qdate,sint,slink],axis=1)
        df.columns = ['Date','Interest Rate', 'URL']
        print(df)

Output: 输出：

  ...
0 ...
1 ...

[2 rows x 3 columns]
 ...
0 ...

[1 rows x 3 columns]
 ...
0 ...
1 ...
2 ...
3 ...

[4 rows x 3 columns]
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
  ...
0 ...

[1 rows x 3 columns]
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []

Answer 1

How about you just don't print/use the empty ones? 那你只是不打印/使用空的呢？

if df.empty:
  continue

Or 要么

if not df.empty:
  print(df)

Answer 2

if df.dropna(how='all').empty:
    continue

as per https://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.Series.empty.html a df with only nans will return False for .empty so if that matters good to use dropna first. 按照https://pandas.pydata.org/pandas-docs/version/0.18/generation/pandas.Series.empty.html仅包含nans的df会为.empty返回False，因此如果很重要，请首先使用dropna。 You can use 'any' if having any NaN is too much or 'all' if you only want to drop a row/column if its all NaNs (probably what you want) 如果NaN过多，则可以使用“ any”；如果所有NaN（可能是您想要的），则只希望删除行/列，则可以使用“ all”

用熊猫删除空的数据框

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-06-26 23:13:03

解决方案2
0 2018-06-27 00:43:22

用熊猫删除空的数据框

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-06-26 23:13:03

解决方案2 0 2018-06-27 00:43:22

解决方案1
3 已采纳 2018-06-26 23:13:03

解决方案2
0 2018-06-27 00:43:22