使用python的toolz包解析CSV

Question

I recently came across the toolz repository and decided to give it a spin. 我最近遇到了toolz存储库，并决定给它一个旋转。

Unfortunately, I'm having some trouble properly using it, or at least understanding it. 不幸的是，我在使用它时遇到了一些麻烦，或者至少对它有所了解。

My first simple task for myself was to parse a tab separated TSV file and get the second column entry in it. 我自己的第一个简单任务是解析一个制表符分隔的TSV文件并获取其中的第二列条目。

For example, given the file foo.tsv : 例如，给定文件foo.tsv ：

a    b    c
d    e    f

I'd like to return a list of ['b', 'e'] . 我想返回一个['b', 'e'] 。 I successfully achieved that with the following piece of logic 我用以下逻辑成功实现了这一点

from toolz.curried import *

with open("foo.tsv", 'r') as f:
    data = pipe(f, map(str.rstrip),
                           map(str.split),
                           map(get(1)),
                           tuple)
    print(data)

However, if I change the foo.tsv file to use commas instead of tabs as the column delimiters I cannot seem to figure out the best way to adjust the above code to handle that. 但是，如果我将foo.tsv文件更改为使用逗号而不是制表符作为列分隔符，我似乎无法找出调整上述代码来处理它的最佳方法。 It's not clear to me how to add best a "," argument to the str.split function while using the map with either the pipe or thread_first functions. 我不清楚如何在使用带有pipe或thread_first函数的map时为str.split函数添加最好的","参数。

Is there already some existing documentation that already describes this? 是否已有一些已经描述过的现有文档？

Answer 1

lambdas lambda表达式

Don't be afraid of using lambdas. 不要害怕使用lambdas。

map(lambda s: s.split(','))

It's maybe a bit less pretty than map(str.split) but it gets the point across 它可能不如map(str.split)那么漂亮，但它得到了重点

Use pluck 使用采摘

Consider using pluck(...) rather than map(get(...)) 考虑使用pluck(...)而不是map(get(...))

map(get(1)) -> pluck(1)

Use Pandas 使用熊猫

If you have a CSV file you might consider just using Pandas, which is very fast and highly optimized for this kind of work. 如果你有一个CSV文件，你可能会考虑使用Pandas，这是非常快速和高度优化的这种工作。

Answer 2

Based upon MRocklin 's above answer, my CSV parsing code using toolz should look more like: 基于MRocklin的上述答案，我使用toolz CSV解析代码应该更像：

with open("foo.tsv", 'r') as f:
    data = pipe(f, map(lambda (s): str.rstrip(s, "\n")),
                   map(lambda (s): str.split(s, "\t")),
                   pluck(1),
                   tuple)
    print(data)

Answer 3

Your version for the tsv file can be shortened to: 您的tsv文件版本可以缩短为：

pipe(f, map(str.split), pluck(1), tuple)

To read a comma separated file, use something like this: 要读取逗号分隔文件，请使用以下内容：

pipe(f, map(lambda s: s.split(',')), pluck(1), map(str.strip), tuple)

使用python的toolz包解析CSV

问题描述

3 个解决方案

解决方案1
2 已采纳 2015-11-13 00:01:17

lambdas lambda表达式

Use pluck 使用采摘

Use Pandas 使用熊猫

解决方案2
0 2015-11-13 00:15:25

解决方案3
0 2015-12-25 11:41:49

使用python的toolz包解析CSV

问题描述

3 个解决方案

解决方案1 2 已采纳 2015-11-13 00:01:17

lambdas lambda表达式

Use pluck 使用采摘

Use Pandas 使用熊猫

解决方案2 0 2015-11-13 00:15:25

解决方案3 0 2015-12-25 11:41:49

解决方案1
2 已采纳 2015-11-13 00:01:17

解决方案2
0 2015-11-13 00:15:25

解决方案3
0 2015-12-25 11:41:49