将列表存储到 Python Sqlite3

Question

I am trying to scrape form field IDs using Beautiful Soup like this我正在尝试像这样使用 Beautiful Soup 来抓取表单字段 ID

 for link in BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')):
    if link.has_key('id'):
        print link['id']

Lets us assume that it returns something like让我们假设它返回类似

username
email
password
passwordagain
terms
button_register

I would like to write this into Sqlite3 DB.我想将其写入 Sqlite3 DB。

What I will be doing down the line in my application is... Use these form fields' IDs and try to do a POST may be.我将在我的应用程序中做的是......使用这些表单字段的 ID 并尝试做一个 POST 可能是。 The problem is.. there are plenty of sites like this whose form field IDs I have scraped.问题是.. 有很多这样的网站，我已经刮掉了它们的表单字段 ID。 So the relation is like this...所以关系是这样的...

Domain1 - First list of Form Fields for this Domain1
Domain2 - Second list of Form Fields for this Domain2
.. and so on

What I am unsure here is... How should I design my column for this kind of purpose?我在这里不确定的是......我应该如何设计我的专栏来实现这种目的？ Will it be OK if I just create a table with two columns - say如果我只创建一个包含两列的表可以吗 - 比如说

COL 1 - Domain URL (as TEXT)
COL 2 - List of Form Field IDs (as TEXT)

One thing to be remembered is... Down the line in my application I will need to do something like this...要记住的一件事是......在我的应用程序中，我需要做这样的事情......

Pseudocode伪代码

If Domain is "http://somedomain.com":
    For ever item in the COL2 (which is a list of form field ids):
         Assign some set of values to each of the form fields & then make a POST request

Can any one guide, please?请问哪位能指导一下？

EDITed on 22/07/2011 - Is My Below Database Design Correct?于 2011 年 7 月 22 日编辑 - 我的以下数据库设计是否正确？

I have decided to have a solution like this.我决定有这样的解决方案。 What do you guys think?你们有什么感想？

I will be having three tables like below我将有如下三个表

Table 1表格1

Key Column (Auto Generated Integer) - Primary Key
Domain as TEXT

Sample Data would be something like:示例数据将类似于：

1   http://url1.com
2   http://url2.com
3   http://url3.com

Table 2表 2

Domain (Here I will be using the Key Number from Table 1)
RegLink - This will have the registeration link (as TEXT)
Form Fields (as Text)

Sample Data would be something like:示例数据将类似于：

1   http://url1.com/register    field1
1   http://url1.com/register    field2
1   http://url1.com/register    field3
2   http://url2.com/register    field1
2   http://url2.com/register    field2
2   http://url2.com/register    field3
3   http://url3.com/register    field1
3   http://url3.com/register    field2
3   http://url3.com/register    field3

Table 3表3

Domain (Here I will be using the Key Number from Table 1)
Status (as TEXT)
User (as TEXT)
Pass (as TEXT)

Sample Data would be something like:示例数据将类似于：

1   Pass    user1   pass1
2   Fail    user2   pass2
3   Pass    user3   pass3

Do you think this table design is good?你觉得这个餐桌设计好不好？ Or are there any improvements that can be made?或者有什么可以改进的吗？

Answer 1

There is a normalization problem in your table.您的表中存在规范化问题。

Using 2 tables with使用 2 个表

TABLE domains
int id primary key
text name

TABLE field_ids
int id primary key
int domain_id foreign key ref domains
text value

is a better solution.是一个更好的解决方案。

Answer 2

Proper database design would suggest you have a table of URLs, and a table of fields, each referenced to a URL record.正确的数据库设计会建议您有一个 URL 表和一个字段表，每个都引用 URL 记录。 But depending on what you want to do with them, you could pack lists into a single column.但根据您想对它们做什么，您可以将列表打包成一列。 See the docs for how to go about that .有关如何了解 go 的信息，请参阅文档。

Is sqlite a requirement? sqlite 是必需的吗？ It might not be the best way to store the data.这可能不是存储数据的最佳方式。 Eg if you need random-access lookups by URL, the shelve module might be a better bet.例如，如果您需要通过 URL 进行随机访问查找，搁置模块可能是更好的选择。 If you just need to record them and iterate over the sites, it might be simpler to store as CSV.如果您只需要记录它们并遍历站点，则存储为 CSV 可能更简单。

Answer 3

Try this to get the ids:试试这个来获取ID：

ids = (link['id'] for link in
        BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')) 
         if link.has_key('id'))

And this should show you how to save them, load them, and do something to each.这应该向您展示如何保存它们，加载它们，并对它们做一些事情。 This uses a single table and just inserts one row for each field for each domain.这使用单个表，并且只为每个域的每个字段插入一行。 It's the simplest solution, and perfectly adequate for a relatively small number of rows of data.这是最简单的解决方案，完全适合相对较少的数据行。

from itertools import izip, repeat
import sqlite3

conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table domains
(domain text, linkid text)''')

domain_to_insert = 'domain_name'
ids = ['id1', 'id2']
c.executemany("""insert into domains
      values (?, ?)""", izip(repeat(domain_to_insert), ids))
conn.commit()

domain_to_select = 'domain_name'
c.execute("""select * from domains where domain=?""", (domain_to_select,))

# this is just an example
def some_function_of_row(row):
    return row[1] + ' value'

fields = dict((row[1], some_function_of_row(row)) for row in c)
print fields
c.close()

Answer 4

Convert the list to a string on saving, using str() .使用str()将列表转换为保存时的字符串。 Then convert it back to a list on loading using eval()然后使用eval()将其转换回加载列表

Try this to see for yourself:试试这个，看看自己：

x = [1, 2]
print(type(x))
y = str(x)
print(type(y))
z = eval(y)
print(type(z))
print(x)
print(y)
print(z)

将列表存储到 Python Sqlite3

问题描述

3 个解决方案

解决方案1
1 2011-07-21 12:10:46

解决方案2
1 2011-07-21 12:11:34

解决方案3
0 已采纳 2011-07-21 12:13:42

解决方案4
0 2022-01-18 19:04:29

将列表存储到 Python Sqlite3

问题描述

3 个解决方案

解决方案1 1 2011-07-21 12:10:46

解决方案2 1 2011-07-21 12:11:34

解决方案3 0 已采纳 2011-07-21 12:13:42

解决方案4 0 2022-01-18 19:04:29

解决方案1
1 2011-07-21 12:10:46

解决方案2
1 2011-07-21 12:11:34

解决方案3
0 已采纳 2011-07-21 12:13:42

解决方案4
0 2022-01-18 19:04:29