[英]Are there a tools that can help to treat files as tables in a database?
I have csv files and would like to treat them as tables of a database. 我有csv文件,想将它们视为数据库表。 Of course I can transform these files into tables.
当然,我可以将这些文件转换为表格。 But it would be nice to have a possibility to do it directly in the command line (in a way like
grep
, head
, tail
, sort
and awk
are used). 但是有可能直接在命令行中进行操作(使用
grep
, head
, tail
, sort
和awk
类的方式)会很好。
For example I would like to select
a particular column of a file (given by its name), or select rows where
certain columns have certain values, or order by
one of the columns. 例如,我想
select
的文件的特定列(通过其名称给定的),或选择的行where
某些列有一定的值,或者order by
的列中的一个。
Since you tagged this with python and ipython, I assume you'd like to see what it would be like to do this from an ipython prompt. 由于您使用python和ipython对此进行了标记,因此我假设您想在ipython提示符下查看执行此操作的方式。 So, here's a trivial CSV file people.csv:
因此,这是一个简单的CSV文件people.csv:
first,last,age
John,Smith,20
Jane,Smith,19
Frank,Jones,30
Now, here's an ipython session using it: 现在,这是一个使用它的ipython会话:
In [1]: import csv
In [2]: from operator import *
In [3]: with open('foo.csv') as f: people = list(csv.DictReader(f))
In [4]: [p['age'] for p in sorted(people, key=itemgetter('first')) if p['last'] == 'Smith']
Out[4]: ['19', '20']
It takes one line to read a CSV file into memory as a list of dicts. 将CSV文件作为字典列表读入内存需要一行。
Given that, you can run list comprehensions on it. 鉴于此,您可以对其运行列表推导。
So, the p['age']
selects a column by name; 因此,
p['age']
按名称选择一列; the sorted(people, itemgetter('first'))
orders by another column, and the if p['last'] == 'Smith'
is a where clause. sorted(people, itemgetter('first'))
由另一列sorted(people, itemgetter('first'))
,并且if p['last'] == 'Smith'
是where子句。
That second one is a bit clunky, but we can fix that: 第二个有点笨拙,但是我们可以解决这个问题:
In [5]: def orderby(table, column): return sorted(table, key=itemgetter(column))
In [6]: [p['age'] for p in orderby(people, 'first') if p['last'] == 'Smith']
Out[6]: ['19', '20']
You can even do group by
clauses with a little help from itertools
, although here you'll definitely want to define helper functions both for groupby and for the aggregates to apply to groups, and I think it still might be pushing the limits a bit… 您甚至可以在
itertools
的少许帮助下进行group by
子句,尽管您在这里肯定要定义用于groupby和将聚合应用于组的辅助函数,而且我认为它可能仍在推动限制...
In [7]: from itertools import *
In [8]: def ilen(iterable): return sum(1 for _ in iterable)
In [9]: def group(table, column): return groupby(table, itemgetter(column))
In [10]: [(k, ilen(g)) for k, g in group(people, 'last')]
Out[10]: [('Smith', 2), ('Jones', 1)]
In [11]: def glen(kg): return kg[0], sum(1 for _ in kg[1])
In [12]: [glen(g) for g in group(people, 'last')]
Out[12]: [('Smith', 2), ('Jones', 1)]
In [13]: def gsum(kg, column): return kg[0], sum(int(x[column]) for x in kg[1])
In [14]: [gsum(g, 'age') for g in group(people, 'last')]
Out[14]: [('Smith', 39), ('Jones', 30)]
However, there are a few things to keep in mind: 但是,请记住以下几点:
Keep in mind that it's only about 5 lines of code to convert a CSV file to a sqlite table. 请记住,将CSV文件转换为sqlite表仅需5行代码。 So, I think you'd be better off with a script that just runs your 5-line Python program and dumps you into a sqlite command line.
因此,我认为使用只运行5行Python程序并将其转储到sqlite命令行的脚本会更好。
Since you tagged this with 'python', python's 'pandas' module provides a DataFrame object that provides the functionality that you seem to want here. 由于您使用“ python”标记了此内容,因此python的“ pandas”模块提供了一个DataFrame对象,该对象提供了您在这里想要的功能。 Use pandas.read_csv() to read in the CSV file.
使用pandas.read_csv()读取CSV文件。 A quick primer on DataFrames is provided here: http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe
此处提供有关DataFrames的快速入门: http ://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.