在python2.7中的for循環中使用的已定義函數

Question

我的python腳本是否有問題？

from BeautifulSoup import BeautifulSoup
import requests
import re
from collections import defaultdict
import itertools
import pandas as pd

def wego(weburl,annot):
    print 'Go Term: ', weburl.split('=')[-1]
    html=requests.get(weburl).text
    soup=BeautifulSoup(html)
    desc=r"desc=\".*\""
    print "GO leave 2 term:",(re.findall(desc,str(soup))[0].split('"')[1])
    pattern=r"Unigene.*A"
    idDF = pd.DataFrame(columns=['GeneID']) #creates a new datafram
    idDF['GeneID'] = pd.Series(re.findall(pattern,str(soup))).unique()
    print "Total Go term is :",idDF.shape[0]
    old=pd.read_csv(annot,usecols=[0,7,8])
    getset=pd.merge(left=idDF,right=old,left_on=idDF.columns[0],\
    right_on=old.columns[0])
    updown=getset.groupby(getset.columns[1]).count()
    print updown
    print "Max P-value: ","{:.3e}".format(getset['P-value'].max())

with open("gourl.txt") as ur:
    d=[]
    for url in ur:
    we=wego(url,annot="file.csv")
    d.append(we)

我的gourl.txt文件是一些網址，一行一行

http://stackoverflow.com/questions=1
http://stackoverflow.com/questions=2

我的問題是，為什么當gourl.txt文件中只有一個URL時腳本可以成功，而當一個以上URL失敗時腳本為什么可以成功？

錯誤如下：

IndexError: list index out of range
IndexErrorTraceback (most recent call last)
<ipython-input-79-a852fe95d69c> in <module>()
  2     d=[]
  3     for url in ur:
----> 4         we=wego(url,annot="file.csv")
  5         d.append(we)
<ipython-input-4-9fdf25e75434> in wego(weburl, annot)
  5     soup=BeautifulSoup(html)
  6     desc=r"desc=\".*\""
----> 7     print "GO leave 2 term:",(re.findall(desc,str(soup))
 [0].split('"')[1])
  8     pattern=r"Unigene.*A"
  9     idDF = pd.DataFrame(columns=['GeneID']) #creates a new dataframe 
 IndexError: list index out of range

Answer 1

如果您查看您給我們的堆棧跟蹤，就可以看到答案。 最后一行說您正在嘗試訪問一個不存在的列表元素（“超出范圍”）

print "GO leave 2 term:",(re.findall(desc,str(soup))[0].split('"')[1])

您在此行進行2次列表訪問。 一個獲取第一個匹配的模式，另一個獲取由split('"')產生的第二項。

因此，第二個URL可能沒有您期望的這種模式。

您可以使用如下形式：

matches = re.findall(desc, str(soup))
tokens = []
if matches:
    tokens = matches[0].split('"')
if len(tokens) > 1:
    print("GO leave 2 term:", tokens[1])

Answer 2

問題解決了，真高興。 問題出在我的gourl.txt文件中，\\ n被讀取。 我將顯示以下內容：

>>> with open("wegourl.txt") as ur:
...     d=[]
...     for url in ur:
...         print url
...         

http://stackoverflow.com/questions=1

http://stackoverflow.com/questions=2

無疑，由換行符引起的空行不是合法的URL，因此會中斷此腳本。 當讀取文件時，我可以修改url=url.strip('\\n') ： url=url.strip('\\n')

在python2.7中的for循環中使用的已定義函數

問題描述

2 個解決方案

解決方案1
0 2017-03-17 12:32:22

解決方案2
0 2017-03-18 04:29:52

在python2.7中的for循環中使用的已定義函數

問題描述

2 個解決方案

解決方案1 0 2017-03-17 12:32:22

解決方案2 0 2017-03-18 04:29:52

解決方案1
0 2017-03-17 12:32:22

解決方案2
0 2017-03-18 04:29:52