[英]How can I improve the speed of a python script that compares two lists and a value between a range?
我有两个大文件数据集:
File1:
Gen1 1 1 10
Gen2 1 2 20
Gen3 2 30 40
File2:
A 1 4
B 1 15
C 2 2
预期产量:
Out:
Gen1 1 1 10 A 1 4
Gen2 1 2 20 B 1 15
现在,我有了代码,基本上,如果file2 [1]与file1 [1]匹配并且在文件1的范围内,则试图查找文件2在文件1中的实例。
我的代码如下:
for i in file1:
temp = i.split()
for a in file2:
temp2 = a.split()
if temp[1] == temp2[1] and temp2[2] >= temp[2] and temp2[2] <= temp[3]
print(i + " " + a + "\n")
else:
continue
该代码有效,但我认为它花费的时间比应该花费的时间长得多。 有没有更简单的方法或方法来执行此操作? 我觉得我没有使用地图或散列的某种巧妙用法。
谢谢!
熊猫可能是一个不错的选择。 请参阅此示例。
文件较大时,我更喜欢使用sqlite而不是pandas。 可以从sqlite DB加载熊猫数据帧。
import sqlite3
file1 = """Gen1 1 1 10
Gen2 1 2 20
Gen3 2 30 40"""
file2 = """A 1 4
B 1 15
C 2 2"""
# your code (fixed)
print("desired output")
for i in file1.splitlines():
temp = i.split()
for a in file2.splitlines():
temp2 = a.split()
if temp[1] == temp2[1] and int(temp2[2]) >= int(temp[2]) and int(temp2[2]) <= int(temp[3]):
print(i + " " + a)
# Make an in-memory db
# Set a filename if your files are too big or if you want to reuse this db
con = sqlite3.connect(":memory:")
c = con.cursor()
c.execute("""CREATE TABLE file1
(
gene_name text,
a integer,
b1 integer,
b2 integer
)""")
for row in file1.splitlines():
if row:
c.execute("INSERT INTO file1 (gene_name, a, b1, b2) VALUES (?,?,?,?)", tuple(row.split()))
c.execute("""CREATE TABLE file2
(
name text,
a integer,
b integer
)""")
for row in file2.splitlines():
if row:
c.execute("INSERT INTO file2 (name, a, b) VALUES (?,?,?)", tuple(row.split()))
# join tow tables
print("sqlite3 output")
for row in c.execute("""SELECT
file1.gene_name,
file1.a,
file1.b1,
file1.b2,
file2.name,
file2.a,
file2.b
FROM file1
JOIN file2 ON file1.a = file2.a AND file2.b >= file1.b1 AND file2.b <= file1.b2
"""):
print(row)
con.close()
输出:
desired output
Gen1 1 1 10 A 1 4
Gen2 1 2 20 A 1 4
Gen2 1 2 20 B 1 15
sqlite3 output
(u'Gen1', 1, 1, 10, u'A', 1, 4)
(u'Gen2', 1, 2, 20, u'A', 1, 4)
(u'Gen2', 1, 2, 20, u'B', 1, 15)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.