简体   繁体   English

(PRAW)从评论中获取一个值,然后用另一列的数据作为答复,该数据与第一条数据在同一行

[英](PRAW) Get a value from a comment, then reply with another column's data, that's on the same row as the first piece of data

Here's the CSV code. 这是CSV代码。

import praw
import time
import csv
import codecs
import re
from collections import defaultdict 

def read_csv():
with open('CPU-Bench.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        import pandas as pd
        df = pd.read_csv('CPU-Bench.csv')
        saved_column = df.URL #you can also use df['column_name']
    print saved_column

saved_column = df.URL #you can also use df['column_name']


model_url_dict = read_csv()

The CSV file is kind of like this: CSV文件如下所示:

Type,Part Number,Brand,Model,Rank,Benchmark,Samples,URL
CPU,BX80671I76950X,Intel,Core i7-6950X,1,117,25,http://cpu.userbenchmark.com/Intel-Core-i7-6950X/Rating/3604
CPU,BX80671I76900K,Intel,Core i7-6900K,2,112,28,http://cpu.userbenchmark.com/Intel-Core-i7-6900K/Rating/3605
CPU,BX80671I76850K,Intel,Core i7-6850K,3,102,55,http://cpu.userbenchmark.com/Intel-Core-i7-6850K/Rating/3606
CPU,BX80648I75960X,Intel,Core i7-5960X,4,102,1651,http://cpu.userbenchmark.com/Intel-Core-i7-5960X/Rating/2580
CPU,BX80662I76700K,Intel,Core i7-6700K,5,98.5,21550,http://cpu.userbenchmark.com/Intel-Core-i7-6700K/Rating/3502
CPU,BX80671I76800K,Intel,Core i7-6800K,6,97,103,http://cpu.userbenchmark.com/Intel-Core-i7-6800K/Rating/3607

I'd like to make it so that if a user says "!benchmark i7 6950x", or "!benchmark i7-6950x", "!benchmark Core i7-6950x", or "!benchmark Intel Core i7-6950x", it looks for the string after !benchmark (in this case, i7 6950x), and, finds that, looks at the URL column, and replies with "Here's some benchmarks for " 我想这样做,以便用户说“!benchmark i7 6950x”或“!benchmark i7-6950x”,“!benchmark Core i7-6950x”或“!benchmark Intel Core i7-6950x”,在!benchmark之后查找字符串(在本例中为i7 6950x),然后查找URL列,然后回复“此处有一些基准”

But, in place of the is the chosen CPU (again, in this case, i7 6950x) 但是,代替的是所选的CPU(在这种情况下,再次为i7 6950x)

And in place of url_column is the URL of that CPU (in this case, http://cpu.userbenchmark.com/Intel-Core-i7-6950X/Rating/3604 ) 并且代替url_column的是该CPU的URL(在本例中为http://cpu.userbenchmark.com/Intel-Core-i7-6950X/Rating/3604

Sorry if that's confusing, but, how do I do this? 抱歉,这很令人困惑,但是,我该怎么做?

You definetely need to setup reddit agent, authenticate it as reddit application in settings and all that other stuff that you do for normal bot. 您绝对需要设置reddit代理,将其作为设置中的reddit应用程序进行身份验证,以及对普通bot所做的所有其他操作。 Oh and also it would be nice to parse dataframe to dictionary for easy lookup (since you will only do it when bot starts and then have really fast time to look up). 哦,将数据帧解析为字典以方便查找也是一件好事(因为只有在bot启动后才进行查找,然后才有非常快的查找时间)。

So we will need dictionary like this {'Intel Core i7-6950x':'http://cpu.userbenchmark.com/Intel-Core-i7-6950X/Rating/3604', ...} , which is pretty trivial considering you already have that read. 因此,我们将需要像这样的字典{'Intel Core i7-6950x':'http://cpu.userbenchmark.com/Intel-Core-i7-6950X/Rating/3604', ...} ,考虑到这是非常琐碎的你已经读过了。

We also need a way to parse what user actually wants to benchmark. 我们还需要一种解析用户实际想要进行基准测试的方法。 So parse_models_from_comment(comment) that would take praw.comment as argument with guarantee that there is at least one occurence of !benchmark . 因此parse_models_from_comment(comment)会将praw.comment作为参数,并确保至少出现一次!benchmark It would have to probably regex match with "\\!benchmark (.{0,10}) (i7|i5|i3)-(\\d)(X|K)" or something like that. 它可能必须与"\\!benchmark (.{0,10}) (i7|i5|i3)-(\\d)(X|K)"进行正则表达式匹配。 Can't write more specific without seeing possible data. 在看不到可能的数据的情况下无法撰写更具体的内容。 And also this function would obviously return model names in proper syntax (just like they are written in dictionary) 而且该函数显然会以正确的语法返回模型名称(就像它们是用字典编写的一样)

Now with this set up main routine could look like this: 现在,通过此设置,主例程可能如下所示:

reddit_client = praw.Reddit(user_agent='<your user agent>'
#OAuth2 shenanigans here
d = {} #filled dictionary
answeredComments = []
while True:
    for comment in reddit_client.get_comments('subreddit'):
        if '!benchmark' not in comment.body.lower():
            continue
        if comment in answeredComments:
            continue
        models = parse_models_from_comment(comment)
        if len(models)==0:
            response = 'I was unable to find benchamrk for given query'
        else:
            response = 'Here are some benchmarks for:\n\n'
            for model in models:
                response += str(model) + ' ' + str(d[model])+'\n\n'
        save_this_comment(comment)
        comment.reply(response)
    time.sleep(900) #15 min break

So on top of stuff that I've written earlier, here is code explanation. 因此,在我之前编写的内容之外,这是代码说明。

reddit_client.get_comments('subreddit') obviously returns most recent comments, with default limit of 25, and maximum limit of 1000 (api limitation). reddit_client.get_comments('subreddit')显然会返回最新注释,默认限制为25,最大限制为1000(API限制)。

First if checks if this comment is even relevant to bot, if there is not a single !benchmark it will just skip over it. 首先, if检查此注释是否与bot相关,如果没有单个!benchmark ,它将跳过该注释。 Second check is to make sure this comment has not yet been answered. 第二项检查是确保尚未回答此评论。 For purpose of this snippet it's only a local list, but having cache file with answered comments or even single table database (with sqlite or sth easy) will prove necessary. 出于此代码段的目的,它只是一个本地列表,但事实证明必须要有带有已答复注释的高速缓存文件,甚至是单个表数据库(使用sqlite或sth easy)。 This is used so that bot remembers which comments it already answered. 这样做是为了使bot记住它已经回答了哪些评论。 Otherwise it will spam endlessly. 否则,它将无休止地发送垃圾邮件。

Now we have a comment and a guarantee that bot should be able to find some models. 现在,我们有一个评论,并保证该机器人应该能够找到一些模型。 So we run our parse_models_from_comment(comment) function to get list of cpu models that user wishes to get benchmarks for. 因此,我们运行parse_models_from_comment(comment)函数以获取用户希望获取基准的cpu模型的列表。

If len(models)==0 it means that either user specified cpu that is not in csv, made a typo, inserted wrong format, regex didn't catch it or w/e else. 如果len(models)==0则意味着用户指定了不在csv中的cpu,输入了错字,插入了错误的格式,正则表达式没有捕获它或其他。 Anyway we don't have any models to work on so we should prepare a reply that bot failed with search. 无论如何,我们没有任何模型可以使用,因此我们应该准备一个有关机器人搜索失败的回复。

Otherwise, we have something to work with, and thanks to pre-prepared dictionary for lookups can quickly prepare response with any number of links (could also use markup syntax for that) 否则,我们需要处理一些事情,并且由于预先准备了字典以进行查找,因此可以快速准备具有任意数量链接的响应(也可以使用标记语法)

When response is prepared, first we need to log to file/database that this comment has been answered and next time praw sends it our way we should skip it. 准备好响应后,首先我们需要登录到文件/数据库以确保该注释已得到回答,而下一次praw将其发送给我们时,我们应该跳过它。

Lastly we post reply to given comment with prepared response (whether positive or negative). 最后,我们发布带有给定评论(无论是正面还是负面)的给定评论的回复。 Remember to first log comment to database, then post reply and then optionally check if api request to reply was successful. 请记住首先将注释记录到数据库,然后发布回复,然后(可选)检查api回复请求是否成功。 Other way around might prove buggy. 其他方法可能会证明是错误的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果一个数据框的行值在另一数据框的列中,则创建一个新列并获取该索引 - Create a new column if one dataframe's row value is in another data frame's column and get that index 如何将数据从一列移动到另一列的行? - how to move data from one column to another column's row? 您可以使用PRAW在subreddit中找到某人的第一条评论吗? - Can you use PRAW to find someone's first comment in a subreddit? PRAW:评论提交者的用户名 - PRAW: Comment Submitter's Username 如何在PRAW中存储评论的深度? - How to store a comment's depth in PRAW? 如果从另一列的同一行看到新值,则重复前一行的值然后求和,然后在 Python 中重复当前行 - repeat previous row's value then sum if new values are seen from the same row of another column, then repeat current row in Python 当两者在另一列中共享相等的值时,如何用同一列中的值填充数据框列中的 nan 值? 例如:Where 子句 - How to fill nan values in a data frame's column with a value from the same column when both share the equal value in another column? Ex: Where clause 如何根据另一列的行值查看另一列的数据? - How to view data from another column based on row value of another? PRAW:获取用户的天赋 - PRAW: Get User's Flair 按第一列中的相同值将数据中的列分组 - Grouping columns from data by same value in first column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM