简体   繁体   English

字符串关键字在Python中搜索

[英]String Keyword Searching in Python

I am trying to find a keyword in any index in a list and grab that index. 我试图在列表中的任何索引中找到一个关键字并获取该索引。 I have created a small web scraper using BeautifulSoup4 to scrape fanfiction data. 我使用BeautifulSoup4创建了一个小型的Web刮刀来刮掉小说数据。

Since not all fanfictions have genres or characters listed, or even an update date(if they are newly published) all of the info will be in different indexes. 由于并非所有的粉丝都列出了类型或字符,甚至更新日期(如果它们是新发布的),所有信息都将在不同的索引中。

I therefore need to search for, lets say, 'Words: ' and get the index for the whole string, ie 'Words: 1,854' == list[3], or something like that, and save it as the variable words = list[3] to call on later, in order to put it into an excel file later in the correct columns. 因此,我需要搜索,然后说'Words:'并获取整个字符串的索引,即'Words:1,854'== list [3],或类似的东西,并将其保存为变量words = list [3]稍后调用,以便在以后的正确列中将其放入excel文件中。 Here is my current scraper, it is only set to scrape one page at the moment, just lessen the original value of "u" to add more pages to be scraped. 这是我当前的刮刀,此刻只设置刮一页,只是减少“u”的原始值以添加更多要刮的页面。

import requests
from bs4 import BeautifulSoup
# import time
# from random import randint
# import xlsxwriter
# import urllib3
# from tinydb import TinyDB, Query

total = 0
u = int(1127)

while u < 2000:
    u = u+1
    url = 'https://www.fanfiction.net/Naruto-Crossovers/1402/0/?&srt=1&lan=1&r=10&p=' + str(u)
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')

    raw = soup.find_all('div', class_='z-indent z-padtop')
    for n in range(len(raw)):
        stats = raw[n]
        info = stats.div
        text = info.text
        formatted = text.split(' - ')
        print(formatted[1:(len(formatted))])

Then the solution could be like this (check function find_keyword ) 然后解决方案可能是这样的(检查函数find_keyword

import requests
from bs4 import BeautifulSoup
# import time
# from random import randint
# import xlsxwriter
# import urllib3
# from tinydb import TinyDB, Query

total = 0
u = int(1127)

results = []
while u < 1130: #decreased u due to testing time
    u = u+1
    url = 'https://www.fanfiction.net/Naruto-Crossovers/1402/0/?&srt=1&lan=1&r=10&p=' + str(u)
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')

    raw = soup.find_all('div', class_='z-indent z-padtop')
    for n in range(len(raw)):
        stats = raw[n]
        info = stats.div
        text = info.text
        formatted = text.split(' - ')
        if formatted:
            results.append(formatted)
print(results)

# function to search for a keyword
def find_keyword(list, keyword):
    results = []
    for element in list:
        value = ''
        for tag in element:
            if tag.find(keyword) >= 0:
                value = tag
        results.append(value)

    return(results)

words_list = find_keyword(results, 'Words') #example of how to search and build list for keyword
print(words_list)
This is the code I came up with, it wordks wonderfully. The find function was essential.

# For later use, searches for keywords and adds them to the specified list
    def assign_stats(keyword, stat_list):
        k = 13
        b = 0
        t = 0
        while k >= 1:
            if t == len(formatted):
                t = 0
            check = formatted[t]
            value = check.find(keyword)
            if value != -1:
                # values = formatted[t]
                stat_list.append(check)
                b = 1
            elif k < 2 and b == 0:
                stat_list.append('')

            t = t + 1
            k = k - 1


    # For later use, searches for keywords and adds them to the specified list
    def assign_stats_status(keyword, stat_list):
        k = 13
        b = 0
        t = 0
        while k >= 1:
            if t == len(formatted):
                t = 0
            check = formatted[t]
            value = check.find(keyword)
            if value != -1:
                # values = formatted[t]
                stat_list.append(check)
                b = 1
            elif k < 2 and b == 0:
                stat_list.append('In-Progress')
            t = t + 1
            k = k - 1


    # For later use, searches for specified indexes of story data lists and adds them to specified list
    def assign_stats_concrete(index, stat_list):
        stat_list.append(formatted[index])

    # Searches for keywords/indexes for the specified story stat lists
    assign_stats('Words', words)
    assign_stats_concrete(2, rating)
    assign_stats('English', language)
    assign_stats('Chapters', chapters)
    assign_stats('Reviews', reviews)
    assign_stats('Favs', favorites)
    assign_stats('Follows', follows)
    assign_stats('Updated', updated)
    assign_stats_status('Complete', status)
    assign_stats('Published', published)
    assign_stats_concrete(1, crossover)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM