简体   繁体   English

将列表存储到 Python Sqlite3

[英]Storing a List into Python Sqlite3

I am trying to scrape form field IDs using Beautiful Soup like this我正在尝试像这样使用 Beautiful Soup 来抓取表单字段 ID

 for link in BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')):
    if link.has_key('id'):
        print link['id']

Lets us assume that it returns something like让我们假设它返回类似

username
email
password
passwordagain
terms
button_register

I would like to write this into Sqlite3 DB.我想将其写入 Sqlite3 DB。

What I will be doing down the line in my application is... Use these form fields' IDs and try to do a POST may be.我将在我的应用程序中做的是......使用这些表单字段的 ID 并尝试做一个 POST 可能是。 The problem is.. there are plenty of sites like this whose form field IDs I have scraped.问题是.. 有很多这样的网站,我已经刮掉了它们的表单字段 ID。 So the relation is like this...所以关系是这样的...

Domain1 - First list of Form Fields for this Domain1
Domain2 - Second list of Form Fields for this Domain2
.. and so on

What I am unsure here is... How should I design my column for this kind of purpose?我在这里不确定的是......我应该如何设计我的专栏来实现这种目的? Will it be OK if I just create a table with two columns - say如果我只创建一个包含两列的表可以吗 - 比如说

COL 1 - Domain URL (as TEXT)
COL 2 - List of Form Field IDs (as TEXT)

One thing to be remembered is... Down the line in my application I will need to do something like this...要记住的一件事是......在我的应用程序中,我需要做这样的事情......

Pseudocode伪代码

If Domain is "http://somedomain.com":
    For ever item in the COL2 (which is a list of form field ids):
         Assign some set of values to each of the form fields & then make a POST request

Can any one guide, please?请问哪位能指导一下?

EDITed on 22/07/2011 - Is My Below Database Design Correct?于 2011 年 7 月 22 日编辑 - 我的以下数据库设计是否正确?

I have decided to have a solution like this.我决定有这样的解决方案。 What do you guys think?你们有什么感想?

I will be having three tables like below我将有如下三个表

Table 1表格1

Key Column (Auto Generated Integer) - Primary Key
Domain as TEXT

Sample Data would be something like:示例数据将类似于:

1   http://url1.com
2   http://url2.com
3   http://url3.com

Table 2表 2

Domain (Here I will be using the Key Number from Table 1)
RegLink - This will have the registeration link (as TEXT)
Form Fields (as Text)

Sample Data would be something like:示例数据将类似于:

1   http://url1.com/register    field1
1   http://url1.com/register    field2
1   http://url1.com/register    field3
2   http://url2.com/register    field1
2   http://url2.com/register    field2
2   http://url2.com/register    field3
3   http://url3.com/register    field1
3   http://url3.com/register    field2
3   http://url3.com/register    field3

Table 3表3

Domain (Here I will be using the Key Number from Table 1)
Status (as TEXT)
User (as TEXT)
Pass (as TEXT)

Sample Data would be something like:示例数据将类似于:

1   Pass    user1   pass1
2   Fail    user2   pass2
3   Pass    user3   pass3

Do you think this table design is good?你觉得这个餐桌设计好不好? Or are there any improvements that can be made?或者有什么可以改进的吗?

There is a normalization problem in your table.您的表中存在规范化问题。

Using 2 tables with使用 2 个表

TABLE domains
int id primary key
text name

TABLE field_ids
int id primary key
int domain_id foreign key ref domains
text value

is a better solution.是一个更好的解决方案。

Proper database design would suggest you have a table of URLs, and a table of fields, each referenced to a URL record.正确的数据库设计会建议您有一个 URL 表和一个字段表,每个都引用 URL 记录。 But depending on what you want to do with them, you could pack lists into a single column.但根据您想对它们做什么,您可以将列表打包成一列。 See the docs for how to go about that .有关如何了解 go 的信息,请参阅文档

Is sqlite a requirement? sqlite 是必需的吗? It might not be the best way to store the data.这可能不是存储数据的最佳方式。 Eg if you need random-access lookups by URL, the shelve module might be a better bet.例如,如果您需要通过 URL 进行随机访问查找,搁置模块可能是更好的选择。 If you just need to record them and iterate over the sites, it might be simpler to store as CSV.如果您只需要记录它们并遍历站点,则存储为 CSV 可能更简单。

Try this to get the ids:试试这个来获取ID:

ids = (link['id'] for link in
        BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')) 
         if link.has_key('id'))

And this should show you how to save them, load them, and do something to each.这应该向您展示如何保存它们,加载它们,并对它们做一些事情。 This uses a single table and just inserts one row for each field for each domain.这使用单个表,并且只为每个域的每个字段插入一行。 It's the simplest solution, and perfectly adequate for a relatively small number of rows of data.这是最简单的解决方案,完全适合相对较少的数据行。

from itertools import izip, repeat
import sqlite3

conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table domains
(domain text, linkid text)''')

domain_to_insert = 'domain_name'
ids = ['id1', 'id2']
c.executemany("""insert into domains
      values (?, ?)""", izip(repeat(domain_to_insert), ids))
conn.commit()

domain_to_select = 'domain_name'
c.execute("""select * from domains where domain=?""", (domain_to_select,))

# this is just an example
def some_function_of_row(row):
    return row[1] + ' value'

fields = dict((row[1], some_function_of_row(row)) for row in c)
print fields
c.close()

Convert the list to a string on saving, using str() .使用str()将列表转换为保存时的字符串。 Then convert it back to a list on loading using eval()然后使用eval()将其转换回加载列表

Try this to see for yourself:试试这个,看看自己:

x = [1, 2]
print(type(x))
y = str(x)
print(type(y))
z = eval(y)
print(type(z))
print(x)
print(y)
print(z)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM