是否有工具可以将文件视为数据库中的表？

Question

I have csv files and would like to treat them as tables of a database. 我有csv文件，想将它们视为数据库表。 Of course I can transform these files into tables. 当然，我可以将这些文件转换为表格。 But it would be nice to have a possibility to do it directly in the command line (in a way like grep , head , tail , sort and awk are used). 但是有可能直接在命令行中进行操作（使用grep ， head ， tail ， sort和awk类的方式）会很好。

For example I would like to select a particular column of a file (given by its name), or select rows where certain columns have certain values, or order by one of the columns. 例如，我想select的文件的特定列（通过其名称给定的），或选择的行where某些列有一定的值，或者order by的列中的一个。

Answer 1

Since you tagged this with python and ipython, I assume you'd like to see what it would be like to do this from an ipython prompt. 由于您使用python和ipython对此进行了标记，因此我假设您想在ipython提示符下查看执行此操作的方式。 So, here's a trivial CSV file people.csv: 因此，这是一个简单的CSV文件people.csv：

first,last,age
John,Smith,20
Jane,Smith,19
Frank,Jones,30

Now, here's an ipython session using it: 现在，这是一个使用它的ipython会话：

In [1]: import csv
In [2]: from operator import *
In [3]: with open('foo.csv') as f: people = list(csv.DictReader(f))
In [4]: [p['age'] for p in sorted(people, key=itemgetter('first')) if p['last'] == 'Smith']
Out[4]: ['19', '20']

It takes one line to read a CSV file into memory as a list of dicts. 将CSV文件作为字典列表读入内存需要一行。

Given that, you can run list comprehensions on it. 鉴于此，您可以对其运行列表推导。

So, the p['age'] selects a column by name; 因此， p['age']按名称选择一列； the sorted(people, itemgetter('first')) orders by another column, and the if p['last'] == 'Smith' is a where clause. sorted(people, itemgetter('first'))由另一列sorted(people, itemgetter('first')) ，并且if p['last'] == 'Smith'是where子句。

That second one is a bit clunky, but we can fix that: 第二个有点笨拙，但是我们可以解决这个问题：

In [5]: def orderby(table, column): return sorted(table, key=itemgetter(column))
In [6]: [p['age'] for p in orderby(people, 'first') if p['last'] == 'Smith']
Out[6]: ['19', '20']

You can even do group by clauses with a little help from itertools , although here you'll definitely want to define helper functions both for groupby and for the aggregates to apply to groups, and I think it still might be pushing the limits a bit… 您甚至可以在itertools的少许帮助下进行group by子句，尽管您在这里肯定要定义用于groupby和将聚合应用于组的辅助函数，而且我认为它可能仍在推动限制...

In [7]: from itertools import *
In [8]: def ilen(iterable): return sum(1 for _ in iterable)
In [9]: def group(table, column): return groupby(table, itemgetter(column))
In [10]: [(k, ilen(g)) for k, g in group(people, 'last')]
Out[10]: [('Smith', 2), ('Jones', 1)]
In [11]: def glen(kg): return kg[0], sum(1 for _ in kg[1])
In [12]: [glen(g) for g in group(people, 'last')]
Out[12]: [('Smith', 2), ('Jones', 1)]
In [13]: def gsum(kg, column): return kg[0], sum(int(x[column]) for x in kg[1])
In [14]: [gsum(g, 'age') for g in group(people, 'last')]
Out[14]: [('Smith', 39), ('Jones', 30)]

However, there are a few things to keep in mind: 但是，请记住以下几点：

It requires reading the whole thing into memory. 它需要将整个内容读入内存。
There are no "indexes". 没有“索引”。 With a database, selecting the 20 Smiths out of 100000 people only needs log(100000)+20 steps; 使用数据库，从100000人中选择20个Smiths只需要log（100000）+20个步骤； with a list, it needs 100000 steps. 一个列表，它需要100000个步骤。
You have to order the operations appropriately. 您必须适当地订购操作。 When you want to order, then filter rows, then filter columns (as in the example above), everything is easy; 当您要订购时，然后过滤行，然后过滤列（如上例所示），一切都很容易； if you want a different order (especially if you want to order or filter by columns you aren't selecting), you may need to write more complex comprehensions, while with a database there's no problem at all. 如果您想要不同的顺序（特别是如果要对未选择的列进行排序或过滤），则可能需要编写更复杂的理解，而使用数据库则完全没有问题。

Keep in mind that it's only about 5 lines of code to convert a CSV file to a sqlite table. 请记住，将CSV文件转换为sqlite表仅需5行代码。 So, I think you'd be better off with a script that just runs your 5-line Python program and dumps you into a sqlite command line. 因此，我认为使用只运行5行Python程序并将其转储到sqlite命令行的脚本会更好。

Answer 2

Since you tagged this with 'python', python's 'pandas' module provides a DataFrame object that provides the functionality that you seem to want here. 由于您使用“ python”标记了此内容，因此python的“ pandas”模块提供了一个DataFrame对象，该对象提供了您在这里想要的功能。 Use pandas.read_csv() to read in the CSV file. 使用pandas.read_csv（）读取CSV文件。 A quick primer on DataFrames is provided here: http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe 此处提供有关DataFrames的快速入门： http ://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe

是否有工具可以将文件视为数据库中的表？

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-07-05 12:44:36

解决方案2
2 2013-07-05 17:43:30

是否有工具可以将文件视为数据库中的表？

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-07-05 12:44:36

解决方案2 2 2013-07-05 17:43:30

解决方案1
3 已采纳 2013-07-05 12:44:36

解决方案2
2 2013-07-05 17:43:30