[英]Reading data from text file to postgres with python
I can read the content of a test file test.plt
in the root directory into a postgres
table tempo
using this script:我可以使用以下脚本将根目录中的测试文件
test.plt
的内容读入postgres
表tempo
:
$cat test.plt
Geolife trajectory
WGS 84
Altitude is in Feet
Reserved 3
0,2,255,My Track,0,0,2,8421376
0
39.9756783,116.3308383,0,131.2,39717.4473148148,2008-09-26,10:44:08
39.9756649,116.3308749,0,131.2,39717.4473842593,2008-09-26,10:44:14
39.97564,116.3308749,0,131.2,39717.4474189815,2008-09-26,10:44:17
39.9756533,116.3308583,0,131.2,39717.4474537037,2008-09-26,10:44:20
39.9756316,116.3308299,0,131.2,39717.4474884259,2008-09-26,10:44:23
39.9753166,116.3306299,0,131.2,39717.4480324074,2008-09-26,10:45:10
39.9753566,116.3305916,0,131.2,39717.4480671296,2008-09-26,10:45:13
39.9753516,116.3305249,0,131.2,39717.4481018518,2008-09-26,10:45:16
Python script: Python 脚本:
import psycopg2
from config import config
import os
import glob
query = "INSERT INTO tempo (lat, lon, flag, alt, passeddate, gpsdate, gpstime) VALUES (%s, %s, %s, %s, %s, %s, %s)"
path = '~/Desktop/Data/'
conn = None
try:
#read the connection parameters
params = config()
# connect to the PostgreSQL server
conn = psycopg2.connect(**params)
cur = conn.cursor()
# INSERRT data to the database
with open('test.plt') as file:
file_content = file.readlines()[6:]
values = [line.strip().split(',') for line in file_content]
cur.executemany(query, values)
cur.close()
# commit the changes
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
Results:结果:
postgres=> SELECT * FROM tempo;
id | lat | lon | flag | alt | passeddate | gpsdate | gpstime
----+------------+-------------+------+-------+------------------+------------+----------
| 39.9756783 | 116.3308383 | 0 | 131.2 | 39717.4473148148 | 2008-09-26 | 10:44:08
| 39.9756649 | 116.3308749 | 0 | 131.2 | 39717.4473842593 | 2008-09-26 | 10:44:14
| 39.97564 | 116.3308749 | 0 | 131.2 | 39717.4474189815 | 2008-09-26 | 10:44:17
| 39.9756533 | 116.3308583 | 0 | 131.2 | 39717.4474537037 | 2008-09-26 | 10:44:20
| 39.9756316 | 116.3308299 | 0 | 131.2 | 39717.4474884259 | 2008-09-26 | 10:44:23
| 39.9753166 | 116.3306299 | 0 | 131.2 | 39717.4480324074 | 2008-09-26 | 10:45:10
| 39.9753566 | 116.3305916 | 0 | 131.2 | 39717.4480671296 | 2008-09-26 | 10:45:13
| 39.9753516 | 116.3305249 | 0 | 131.2 | 39717.4481018518 | 2008-09-26 | 10:45:16
(8 rows)
I can also get the names of all files in all sub-directories with the .plt
extension (stripping the .plt
ext ) by replacing the insert statement in the python script with:我还
.plt
通过将.plt
脚本中的插入语句替换为:
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith(".plt"):
print(file.strip('.plt'))
Output: Output:
20081210001529
20081113121334
20081205143505
20081029234123
20081202145929
20081204142253
20081111234235
20081118003844
20081105110052
20081023055305
As you can see, files are named in figures.如您所见,文件以图形命名。 The goal is to take each file name, insert it into the
id
field of the tempo
table, then its contents into the rest of columns.目标是获取每个文件名,将其插入到
tempo
表的id
字段中,然后将其内容放入列的 rest 中。 Repeating this for each in file in all sub-directories.对所有子目录中的每个文件重复此操作。
eg 20081210001529
) is added to the insert query (get inserted to the table)eg 20081210001529
)添加到插入查询中(插入到表中) Using the code below, with the intent to read data from all files in sub-directories (ie having listed all file in code above) gives error with only the first file in the first sub-directory, listed.使用下面的代码,意图从子目录中的所有文件中读取数据(即在上面的代码中列出了所有文件),仅列出第一个子目录中的第一个文件时会出错。
query = "INSERT INTO tempo (lat, lon, flag, alt, passeddate, gpsdate, gpstime) VALUES (%s, %s, %s, %s, %s, %s, %s)"
path = '~/Desktop/Data/'
#Establish connection to postgres
conn = None
try:
#read the connection parameters
params = config()
# connect to the PostgreSQL server
conn = psycopg2.connect(**params)
cur = conn.cursor()
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith(".plt"):
f = open(file, 'r')
content = f.readlines()[6:]
values = [lines.strip().split(',') for line in content]
cur.executemany(query, values)
cur.close()
# commit the changes
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
[Errno 2] No such file or directory: '20081210001529.plt'
I would appreciate your help on this task.感谢您对这项任务的帮助。
file
just holds the filename, you need to create a full path ( os.path.join
). file
仅包含文件名,您需要创建完整路径( os.path.join
)。
I just put you code into 2 functions (first one to find all plt files, second one to insert each file's contents) that makes error handling easier:我只是将您的代码放入 2 个函数中(第一个用于查找所有 plt 文件,第二个用于插入每个文件的内容),这使得错误处理更容易:
def findFiles(rootDir):
pltFiles = []
for path, subdirs, files in os.walk(rootDir):
for x in files:
if x.endswith(".plt"):
pltFiles.append(os.path.join(path, x)) # create full path!
return pltFiles
def insert(pltFilePath):
query = "INSERT INTO tempo (lat, lon, flag, alt, passeddate, gpsdate, gpstime) VALUES (%s, %s, %s, %s, %s, %s, %s)"
conn = psycopg2.connect(**params)
cur = conn.cursor()
f = open(file, 'r')
content = f.readlines()[6:]
values = [lines.strip().split(',') for line in content]
status = False
try:
cur.executemany(query, values)
status = True
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
return status
path = '/tmp/data'
pltFiles = findFiles(path)
if not pltFiles:
print("No plt files found!")
for pltFile in pltFiles:
print("Processing: %s" % pltFile)
res = insert(pltFile)
# res holds true or false, so you could rename, move or delete the file:
#if res:
# os.remove(fullPath)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.