[英]Storing a List into Python Sqlite3
I am trying to scrape form field IDs using Beautiful Soup like this我正在尝试像这样使用 Beautiful Soup 来抓取表单字段 ID
for link in BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')):
if link.has_key('id'):
print link['id']
Lets us assume that it returns something like让我们假设它返回类似
username
email
password
passwordagain
terms
button_register
I would like to write this into Sqlite3 DB.我想将其写入 Sqlite3 DB。
What I will be doing down the line in my application is... Use these form fields' IDs and try to do a POST may be.我将在我的应用程序中做的是......使用这些表单字段的 ID 并尝试做一个 POST 可能是。 The problem is.. there are plenty of sites like this whose form field IDs I have scraped.问题是.. 有很多这样的网站,我已经刮掉了它们的表单字段 ID。 So the relation is like this...所以关系是这样的...
Domain1 - First list of Form Fields for this Domain1
Domain2 - Second list of Form Fields for this Domain2
.. and so on
What I am unsure here is... How should I design my column for this kind of purpose?我在这里不确定的是......我应该如何设计我的专栏来实现这种目的? Will it be OK if I just create a table with two columns - say如果我只创建一个包含两列的表可以吗 - 比如说
COL 1 - Domain URL (as TEXT)
COL 2 - List of Form Field IDs (as TEXT)
One thing to be remembered is... Down the line in my application I will need to do something like this...要记住的一件事是......在我的应用程序中,我需要做这样的事情......
Pseudocode伪代码
If Domain is "http://somedomain.com":
For ever item in the COL2 (which is a list of form field ids):
Assign some set of values to each of the form fields & then make a POST request
Can any one guide, please?请问哪位能指导一下?
EDITed on 22/07/2011 - Is My Below Database Design Correct?于 2011 年 7 月 22 日编辑 - 我的以下数据库设计是否正确?
I have decided to have a solution like this.我决定有这样的解决方案。 What do you guys think?你们有什么感想?
I will be having three tables like below我将有如下三个表
Table 1表格1
Key Column (Auto Generated Integer) - Primary Key
Domain as TEXT
Sample Data would be something like:示例数据将类似于:
1 http://url1.com
2 http://url2.com
3 http://url3.com
Table 2表 2
Domain (Here I will be using the Key Number from Table 1)
RegLink - This will have the registeration link (as TEXT)
Form Fields (as Text)
Sample Data would be something like:示例数据将类似于:
1 http://url1.com/register field1
1 http://url1.com/register field2
1 http://url1.com/register field3
2 http://url2.com/register field1
2 http://url2.com/register field2
2 http://url2.com/register field3
3 http://url3.com/register field1
3 http://url3.com/register field2
3 http://url3.com/register field3
Table 3表3
Domain (Here I will be using the Key Number from Table 1)
Status (as TEXT)
User (as TEXT)
Pass (as TEXT)
Sample Data would be something like:示例数据将类似于:
1 Pass user1 pass1
2 Fail user2 pass2
3 Pass user3 pass3
Do you think this table design is good?你觉得这个餐桌设计好不好? Or are there any improvements that can be made?或者有什么可以改进的吗?
There is a normalization problem in your table.您的表中存在规范化问题。
Using 2 tables with使用 2 个表
TABLE domains
int id primary key
text name
TABLE field_ids
int id primary key
int domain_id foreign key ref domains
text value
is a better solution.是一个更好的解决方案。
Proper database design would suggest you have a table of URLs, and a table of fields, each referenced to a URL record.正确的数据库设计会建议您有一个 URL 表和一个字段表,每个都引用 URL 记录。 But depending on what you want to do with them, you could pack lists into a single column.但根据您想对它们做什么,您可以将列表打包成一列。 See the docs for how to go about that .有关如何了解 go 的信息,请参阅文档。
Is sqlite a requirement? sqlite 是必需的吗? It might not be the best way to store the data.这可能不是存储数据的最佳方式。 Eg if you need random-access lookups by URL, the shelve module might be a better bet.例如,如果您需要通过 URL 进行随机访问查找,搁置模块可能是更好的选择。 If you just need to record them and iterate over the sites, it might be simpler to store as CSV.如果您只需要记录它们并遍历站点,则存储为 CSV 可能更简单。
Try this to get the ids:试试这个来获取ID:
ids = (link['id'] for link in
BeautifulSoup(content, parseOnlyThese=SoupStrainer('input'))
if link.has_key('id'))
And this should show you how to save them, load them, and do something to each.这应该向您展示如何保存它们,加载它们,并对它们做一些事情。 This uses a single table and just inserts one row for each field for each domain.这使用单个表,并且只为每个域的每个字段插入一行。 It's the simplest solution, and perfectly adequate for a relatively small number of rows of data.这是最简单的解决方案,完全适合相对较少的数据行。
from itertools import izip, repeat
import sqlite3
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table domains
(domain text, linkid text)''')
domain_to_insert = 'domain_name'
ids = ['id1', 'id2']
c.executemany("""insert into domains
values (?, ?)""", izip(repeat(domain_to_insert), ids))
conn.commit()
domain_to_select = 'domain_name'
c.execute("""select * from domains where domain=?""", (domain_to_select,))
# this is just an example
def some_function_of_row(row):
return row[1] + ' value'
fields = dict((row[1], some_function_of_row(row)) for row in c)
print fields
c.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.