简体   繁体   English

提取列表中字符串的最后 n 个字符

[英]Extract the last n characters of a string in a list

I have a list of links in which some have page numbers and some don't.我有一个链接列表,其中有些有页码,有些没有。 I'm trying to scrape the website to get the page number but it is in this format: '\n\n\n\n\n\n\n\n«\nPrevious\n\n\n\n\n\n\n1\n\n2\n\n3\n\n\x85\n\n23'我正在尝试抓取网站以获取页码,但它采用以下格式:'\n\n\n\n\n\n\n\n«\nPrevious\n\n\n\n\n \n\n1\n\n2\n\n3\n\n\x85\n\n23'

Can someone help me just extract the last 2 characters of the list?有人可以帮我提取列表的最后 2 个字符吗?

Here's the code I am using and the output I am getting.这是我正在使用的代码和我得到的 output。

for i in range(0, len(links)):
    url = links[i]
    response = requests.get(url, cookies)
    soup = BeautifulSoup(response.content)
    pr = [f.text for f in soup.find_all(class_='lia-paging-full-wrapper lia-paging-pager lia-paging-full-left-position lia-discussion-page-message-pager lia-forum-topic-page-gte-5-pager lia-component-message-pager')]
    ed = [i.split('\n\n\n\n\n\nNext\n»\n\n\n\n', 1)[0] for i in pr]
    print(ed)

The output I'm getting is this:我得到的 output 是这样的:

['\n\n\n\n\n\n\n\n«\nPrevious\n\n\n\n\n\n\n1\n\n2\n\n3\n\n\x85\n\n23']
[]
['\n\n\n\n\n\n\n\n«\nPrevious\n\n\n\n\n\n\n1\n\n2\n\n3']
['\n\n\n\n\n\n\n\n«\nPrevious\n\n\n\n\n\n\n1\n\n2\n\n3']
[]
[]
[]
[]
['\n\n\n\n\n\n\n\n«\nPrevious\n\n\n\n\n\n\n1\n\n2']
['\n\n\n\n\n\n\n\n«\nPrevious\n\n\n\n\n\n\n1\n\n2\n\n3\n\n\x85\n\n16']
[]
[]
[]
['\n\n\n\n\n\n\n\n«\nPrevious\n\n\n\n\n\n\n1\n\n2']
[]

How can I just get the last 2-3 characters as those represent the page numbers?我怎样才能得到最后 2-3 个字符,因为它们代表页码?

You could do ed[-2:] but I noticed you have 1 to 2 digit numbers, there are many ways, one way is just to look for the number at last of the string using regex:你可以做ed[-2:]但我注意到你有 1 到 2 位数字,有很多方法,一种方法是使用正则表达式查找字符串最后的数字:

import re
pattern = re.compile('\d+$')
for i in range(0, len(links)):
    url = links[i]
    response = requests.get(url, cookies)
    soup = BeautifulSoup(response.content)
    pr = [f.text for f in soup.find_all(class_='lia-paging-full-wrapper lia-paging-pager lia-paging-full-left-position lia-discussion-page-message-pager lia-forum-topic-page-gte-5-pager lia-component-message-pager')]
    ed = [i.split('\n\n\n\n\n\nNext\n»\n\n\n\n', 1)[0] for i in pr]
    print(ed)
    if ed:
        page_count = pattern.findall(ed[0])
        print(page_count[0])
    else:
        print('ed is empty!')

OUTPUT: OUTPUT:

23
ed is empty!
3
3
ed is empty!
ed is empty!
ed is empty!
ed is empty!
2
16
ed is empty!
ed is empty!
ed is empty!
2
ed is empty!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM