简体   繁体   English

使用 python 从文本文件读取数据到 postgres

[英]Reading data from text file to postgres with python

I can read the content of a test file test.plt in the root directory into a postgres table tempo using this script:我可以使用以下脚本将根目录中的测试文件test.plt的内容读入postgrestempo

$cat test.plt
Geolife trajectory
WGS 84
Altitude is in Feet
Reserved 3
0,2,255,My Track,0,0,2,8421376
0
39.9756783,116.3308383,0,131.2,39717.4473148148,2008-09-26,10:44:08
39.9756649,116.3308749,0,131.2,39717.4473842593,2008-09-26,10:44:14
39.97564,116.3308749,0,131.2,39717.4474189815,2008-09-26,10:44:17
39.9756533,116.3308583,0,131.2,39717.4474537037,2008-09-26,10:44:20
39.9756316,116.3308299,0,131.2,39717.4474884259,2008-09-26,10:44:23
39.9753166,116.3306299,0,131.2,39717.4480324074,2008-09-26,10:45:10
39.9753566,116.3305916,0,131.2,39717.4480671296,2008-09-26,10:45:13
39.9753516,116.3305249,0,131.2,39717.4481018518,2008-09-26,10:45:16

Python script: Python 脚本:

import psycopg2
from config import config
import os
import glob

query = "INSERT INTO tempo (lat, lon, flag, alt, passeddate, gpsdate, gpstime) VALUES (%s, %s, %s, %s, %s, %s, %s)"

path = '~/Desktop/Data/'

conn = None
try:
    #read the connection parameters
    params = config()
    # connect to the PostgreSQL server
    conn = psycopg2.connect(**params)
    cur = conn.cursor()

    # INSERRT data to the database
    with open('test.plt') as file:
        file_content = file.readlines()[6:]
        values = [line.strip().split(',') for line in file_content]
        cur.executemany(query, values)

    cur.close()
        # commit the changes
    conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
    print(error)
finally:
    if conn is not None:
        conn.close()

Results:结果:

postgres=> SELECT * FROM tempo;
 id |    lat     |     lon     | flag |  alt  |    passeddate    |  gpsdate   | gpstime  
----+------------+-------------+------+-------+------------------+------------+----------
    | 39.9756783 | 116.3308383 |    0 | 131.2 | 39717.4473148148 | 2008-09-26 | 10:44:08
    | 39.9756649 | 116.3308749 |    0 | 131.2 | 39717.4473842593 | 2008-09-26 | 10:44:14
    |   39.97564 | 116.3308749 |    0 | 131.2 | 39717.4474189815 | 2008-09-26 | 10:44:17
    | 39.9756533 | 116.3308583 |    0 | 131.2 | 39717.4474537037 | 2008-09-26 | 10:44:20
    | 39.9756316 | 116.3308299 |    0 | 131.2 | 39717.4474884259 | 2008-09-26 | 10:44:23
    | 39.9753166 | 116.3306299 |    0 | 131.2 | 39717.4480324074 | 2008-09-26 | 10:45:10
    | 39.9753566 | 116.3305916 |    0 | 131.2 | 39717.4480671296 | 2008-09-26 | 10:45:13
    | 39.9753516 | 116.3305249 |    0 | 131.2 | 39717.4481018518 | 2008-09-26 | 10:45:16
(8 rows)

I can also get the names of all files in all sub-directories with the .plt extension (stripping the .plt ext ) by replacing the insert statement in the python script with:我还.plt通过将.plt脚本中的插入语句替换为:

for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith(".plt"):
                print(file.strip('.plt'))

Output: Output:

20081210001529
20081113121334
20081205143505
20081029234123
20081202145929
20081204142253
20081111234235
20081118003844
20081105110052
20081023055305

As you can see, files are named in figures.如您所见,文件以图形命名。 The goal is to take each file name, insert it into the id field of the tempo table, then its contents into the rest of columns.目标是获取每个文件名,将其插入到tempo表的id字段中,然后将其内容放入列的 rest 中。 Repeating this for each in file in all sub-directories.对所有子目录中的每个文件重复此操作。

  1. How do I modify my code so that the filename( eg 20081210001529 ) is added to the insert query (get inserted to the table)如何修改我的代码,以便将文件名( eg 20081210001529 )添加到插入查询中(插入到表中)

Using the code below, with the intent to read data from all files in sub-directories (ie having listed all file in code above) gives error with only the first file in the first sub-directory, listed.使用下面的代码,意图从子目录中的所有文件中读取数据(即在上面的代码中列出了所有文件),仅列出第一个子目录中的第一个文件时会出错。

query = "INSERT INTO tempo (lat, lon, flag, alt, passeddate, gpsdate, gpstime) VALUES (%s, %s, %s, %s, %s, %s, %s)"

path = '~/Desktop/Data/'
#Establish connection to postgres
conn = None
try:
    #read the connection parameters
    params = config()
    # connect to the PostgreSQL server
    conn = psycopg2.connect(**params)
    cur = conn.cursor()

    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith(".plt"):
                f = open(file, 'r')
                content = f.readlines()[6:]
                values = [lines.strip().split(',') for line in content]
                cur.executemany(query, values)

    cur.close()
        # commit the changes
    conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
    print(error)
finally:
    if conn is not None:
        conn.close()
[Errno 2] No such file or directory: '20081210001529.plt'

I would appreciate your help on this task.感谢您对这项任务的帮助。

file just holds the filename, you need to create a full path ( os.path.join ). file仅包含文件名,您需要创建完整路径( os.path.join )。

I just put you code into 2 functions (first one to find all plt files, second one to insert each file's contents) that makes error handling easier:我只是将您的代码放入 2 个函数中(第一个用于查找所有 plt 文件,第二个用于插入每个文件的内容),这使得错误处理更容易:

def findFiles(rootDir):
    pltFiles = []
    for path, subdirs, files in os.walk(rootDir):
        for x in files:
            if x.endswith(".plt"):
                pltFiles.append(os.path.join(path, x))  # create full path!
    return pltFiles

def insert(pltFilePath):
    query = "INSERT INTO tempo (lat, lon, flag, alt, passeddate, gpsdate, gpstime) VALUES (%s, %s, %s, %s, %s, %s, %s)"
    conn = psycopg2.connect(**params)
    cur = conn.cursor()

    f = open(file, 'r')
    content = f.readlines()[6:]
    values = [lines.strip().split(',') for line in content]
    status = False
    try:
        cur.executemany(query, values)
        status = True
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
    return status


path = '/tmp/data'
pltFiles = findFiles(path)

if not pltFiles:
    print("No plt files found!")

for pltFile in pltFiles:
    print("Processing: %s" % pltFile)
    res = insert(pltFile)
    # res holds true or false, so you could rename, move or delete the file:
    #if res:
    #   os.remove(fullPath)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM