简体   繁体   English

如何用Python编写这个片段?

[英]How to write this snippet in Python?

I am learning Python (I have a C/C++ background). 我正在学习Python(我有一个C / C ++背景)。

I need to write something practical in Python though, whilst learning. 我需要在学习的同时用Python编写实用的东西。 I have the following pseudocode (my first attempt at writing a Python script, since reading about Python yesterday). 我有以下伪代码(我第一次尝试编写Python脚本,因为昨天阅读了Python)。 Hopefully, the snippet details the logic of what I want to do. 希望该片段详细说明了我想要做的事情的逻辑。 BTW I am using python 2.6 on Ubuntu Karmic. BTW我在Ubuntu Karmic上使用python 2.6。

Assume the script is invoked as: script_name.py directory_path 假设脚本被调用为:script_name.py directory_path

import csv, sys, os, glob

# Can I declare that the function accepts a dictionary as first arg?
def getItemValue(item, key, defval)
  return !item.haskey(key) ? defval : item[key]


dirname = sys.argv[1]

# declare some default values here
weight, is_male, default_city_id = 100, true, 1 

# fetch some data from a database table into a nested dictionary, indexed by a string
curr_dict = load_dict_from_db('foo')

#iterate through all the files matching *.csv in the specified folder
for infile in glob.glob( os.path.join(dirname, '*.csv') ):
  #get the file name (without the '.csv' extension)
  code = infile[0:-4]
  # open file, and iterate through the rows of the current file (a CSV file)
  f = open(infile, 'rt')
  try:
    reader = csv.reader(f)
    for row in reader:
      #lookup the id for the code in the dictionary
      id = curr_dict[code]['id']
      name = row['name']
      address1 = row['address1']
      address2 = row['address2']
      city_id = getItemValue(row, 'city_id', default_city_id)

      # insert row to database table

  finally:
    f.close()

I have the following questions: 我有以下问题:

  1. Is the code written in a Pythonic enough way (is there a better way of implementing it)? 代码是用Pythonic编写的(有没有更好的实现方法)?

  2. Given a table with a schema like shown below, how may I write a Python function that fetches data from the table and returns is in a dictionary indexed by string (name). 给定一个具有如下所示的模式的表,我如何编写一个从表中获取数据的Python函数,并返回在由string(name)索引的字典中。

  3. How can I insert the row data into the table (actually I would like to use a transaction if possible, and commit just before the file is closed) 如何将行数据插入表中(实际上我想尽可能使用事务,并在文件关闭之前提交)

Table schema: 表模式:

create table demo (id int, name varchar(32), weight float, city_id int);

BTW, my backend database is postgreSQL 顺便说一句,我的后端数据库是postgreSQL

[Edit] [编辑]

Wayne et al: Wayne等人:

To clarify, what I want is a set of rows. 为了澄清,我想要的是一组行。 Each row can be indexed by a key (so that means the rows container is a dictionary (right)?. Ok, now once we have retrieved a row by using the key, I also want to be able to access the 'columns' in the row - meaning that the row data itself is a dictionary. I dont know if Python supports multidimensional array syntax when dealing with dictionaries - but the following statement will help explain how I intend to conceptually use the data returned from the db. A statement like dataset['joe']['weight'] will first fetch the row data indexed by the key 'joe' (which is a dictionary) and then index that dictionary for the key 'weight'. I want to know how to build such a dictionary of dictionaries from the retrieved data in a Pythonic way like you did before. 每一行都可以用键索引(这意味着行容器是一个字典(右)?好了,现在一旦我们使用键检索了一行,我也希望能够访问'列'中的'行 - 意味着行数据本身就是一个字典。我不知道Python在处理字典时是否支持多维数组语法 - 但以下语句将有助于解释我打算如何在概念上使用从db返回的数据。 dataset ['joe'] ['weight']将首先获取由键'joe'(这是一个字典)索引的行数据,然后将该字典索引为键'weight'。我想知道如何构建这样的以你之前的Pythonic方式从检索到的数据中获取字典字典。

A simplistic way would be to write something like: 一种简单的方法是写下这样的东西:

import pyodbc

mydict = {}
cnxn = pyodbc.connect(params)
cursor = cnxn.cursor()
cursor.execute("select user_id, user_name from users"):

for row in cursor:
   mydict[row.id] = row

Is this correct/can it be written in a more pythonic way? 这是正确的/可以用更加pythonic的方式写吗?

to get the value from the dictionary you need to use .get method of the dict : 从你需要使用的字典中获取值.get方法的dict

>>> d = {1: 2}
>>> d.get(1, 3)
2
>>> d.get(5, 3)
3

This will remove the need for getItemValue function. 这将消除对getItemValue函数的需要。 I wont' comment on the existing syntax since it's clearly alien to Python. 我不会评论现有的语法,因为它显然与Python不同。 Correct syntax for the ternary in Python is: Python中三元的正确语法是:

true_val if true_false_check else false_val
>>> 'a' if False else 'b'
'b'

But as I'm saying below, you don't need it at all. 但正如我在下面所说,你完全不需要它。

If you're using Python > 2.6, you should use with statement over the try-finally : 如果您使用的是Python> 2.6,则应该在try-finally使用with语句:

with open(infile) as f:
    reader = csv.reader(f)
    ... etc

Seeing that you want to have row as dictionary, you should be using csv.DictReader and not a simple csv. reader 看到你想把row作为字典,你应该使用csv.DictReader而不是简单的csv. reader csv. reader . csv. reader However, it is unnecessary in your case. 但是,在您的情况下,这是不必要的。 Your sql query could just be constructed to access the fields of the row dict. 您的sql查询可以构造为访问row dict的字段。 In this case you wouldn't need to create separate items city_id , name , etc. To add default city_id to row if it doesn't exist, you could use .setdefault method: 在这种情况下,您不需要创建单独的项目city_idname等。要将默认的city_id添加到row如果它不存在),您可以使用.setdefault方法:

>>> d
{1: 2}
>>> d.setdefault(1, 3)
2
>>> d
{1: 2}
>>> d.setdefault(3, 3)
3
>>> d
{1: 2, 3: 3}

and for id , simply row[id] = curr_dict[code]['id'] 对于id ,只需row[id] = curr_dict[code]['id']

When slicing, you could skip 0 : 切片时,您可以跳过0

>>> 'abc.txt'[:-4]
'abc'

Generally, Python's library provide a fetchone , fetchmany , fetchall methods on cursor, which return Row object, that might support dict-like access or return a simple tuple. 通常,Python的库在游标上提供fetchonefetchmanyfetchall方法,它们返回Row对象,可能支持类似dict的访问或返回一个简单的元组。 It will depend on the particular module you're using. 这取决于您使用的特定模块。

It looks mostly Pythonic enough for me. 看起来Pythonic对我来说足够了。

The ternary operation should look like this though (I think this will return the result you expect): 三元操作应该看起来像这样(我认为这将返回您期望的结果):

return defval if not key in item else item[key]

Yeah, you can pass a dictionary (or any other value) in basically any order. 是的,您可以基本上以任何顺序传递字典(或任何其他值)。 The only difference is if you use the *args, **kwargs (named by convention. Technically you can use any name you want) which expect to be in that order and the last one or two arguments. 唯一的区别是如果你使用* args,** kwargs(通过约定命名。技术上你可以使用你想要的任何名称),它们应该按顺序排列,最后一个或两个参数。

For inserting into a DB you can use the odbc module: 要插入DB,您可以使用odbc模块:

import odbc
conn = odbc.odbc('servernamehere')
cursor = conn.cursor()
cursor.execute("INSERT INTO mytable VALUES (42, 'Spam on Eggs', 'Spam on Wheat')")
conn.commit()

You can read up or find plenty of examples on the odbc module - I'm sure there are other modules as well, but that one should work fine for you. 你可以在odbc模块上阅读或找到大量的例子 - 我确信还有其他模块,但是那个应该可以正常工作。

For retrieval you would use 如需检索,您可以使用

cursor.execute("SELECT * FROM demo")
#Reads one record - returns a tuple
print cursor.fetchone()
#Reads the rest of the records - a list of tuples
print cursor.fetchall()

to make one of those records into a dictionary: 将其中一条记录写入字典:

record = cursor.fetchone()
# Removes the 2nd element (at index 1) from the record
mydict[record[1]] = record[:1] + record[2:]

Though that practically screams for a generator expression if you want the whole shebang at once 虽然如果你想要整个shebang,它几乎会为生成器表达而尖叫

mydict = dict((record[1], record[:1] + record[2:] for record in cursor.fetchall())

which should give you all of the records packed up neatly in a dictionary, using the name as a key. 它应该使用名称作为键,将所有记录整齐地打包在字典中。

HTH HTH

a colon required after def s: 后所需要的结肠def S:

def getItemValue(item, key, defval):
    ...

boolean operators: In python ! 布尔运算符:在python中! -> not ; - > not ; && -> and and || && - > and and || -> or (see http://docs.python.org/release/2.5.2/lib/boolean.html for boolean operators). - > or (有关布尔运算符,请参阅http://docs.python.org/release/2.5.2/lib/boolean.html )。 There's no ? : 没有? : ? : operator in python, there is a return (x) if (x) else (x) expression although I personally rarely use it in favour of plain if 's. ? : python中的运算符,有一个return (x) if (x) else (x)表达式,虽然我个人很少使用它而支持plain if

booleans/ None : True , False and None have capitals before them. booleans / None TrueFalseNone都有大写字母。

checking types of arguments: In python, you generally don't declare types of function parameters. 检查参数类型:在python中,通常不会声明函数参数的类型。 You could go eg assert isinstance(item, dict), "dicts must be passed as the first parameter!" 你可以去assert isinstance(item, dict), "dicts must be passed as the first parameter!" in the function although this kind of "strict checking" is often discouraged as it's not always necessary in python. 在函数中虽然经常不鼓励这种“严格检查”,因为它在python中并不总是必要的。

python keywords: default isn't a reserved python keyword and is acceptable as arguments and variables (just for the reference.) python关键字: default不是保留的python关键字 ,可以作为参数和变量使用(仅供参考)。

style guidelines: PEP 8 (the python style guideline) states that module import s should generally only be one per line, though there are some exceptions (I have to admit I often don't follow the import sys and os on separate lines, though I usually follow it otherwise.) 样式指南: PEP 8 (python样式指南)规定模块import通常应该只是每行一个,尽管有一些例外(我必须承认我经常不在单独的行上遵循import sysos ,尽管我通常会遵循它。)

file open modes: rt isn't valid in python 2.x - it will work, though the t will be ignored. 文件打开模式: rt在python 2.x中无效 - 它会起作用,但t将被忽略。 See also http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files . 另请参见http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files It is valid in python 3 though, so I don't think it it'd hurt if you want to force text mode, raising exceptions on binary characters (use rb if you want to read non-ASCII characters.) 它在python 3中是有效的 ,所以我不认为如果你想强制文本模式,在二进制字符上引发异常就会受到影响(如果你想读取非ASCII字符,请使用rb 。)

working with dictionaries: Python used to use dict.has_key(key) but you should use key in dict now (which has largely replaced it, see http://docs.python.org/library/stdtypes.html#mapping-types-dict .) 使用字典: Python过去常常使用dict.has_key(key)但是你现在应该key in dict使用key in dict (它已经在很大程度上取代了它,请参阅http://docs.python.org/library/stdtypes.html#mapping-types- dict 。)

split file extensions: code = infile[0:-4] could be replaced with code = os.path.splitext(infile)[0] (which returns eg ('root', '.ext') with the dot in the extension (see http://docs.python.org/library/os.path.html#os.path.splitext ). 分割文件扩展名: code = infile[0:-4]可以替换为code = os.path.splitext(infile)[0] (返回例如('root', '.ext')扩展名中的点(参见http://docs.python.org/library/os.path.html#os.path.splitext )。

EDIT: removed multiple variable declarations on a single line stuff and added some formatting. 编辑:删除单行东西上的多个变量声明并添加一些格式。 Also corrected the rt isn't a valid mode in python when in python 3 it is. 还纠正了rt在python 3中不是python中的有效模式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM