在迭代csv文件的行时，将计算所得的列动态添加到pandas数据框？

Question

I have a large space separated input file input.csv , which I can't hold in memory: 我有一个很大的空格分隔的输入文件input.csv ，我无法保存在内存中：

## Header
# More header here
A   B
1   2
3   4

If I use the iterator=True argument for pandas.read_csv , then it returns a TextFileReader / TextParser object. 如果我对pandas.read_csv使用iterator=True参数，则它将返回TextFileReader / TextParser对象。 This allows filtering the file on the fly and only selecting rows for which column A is greater than 2. 这样可以动态过滤文件，并且仅选择A列大于2的行。

But how do I add a third column to the dataframe on the fly without having to loop over all of the data once more? 但是，如何在运行中向数据帧中添加第三列，而不必再次遍历所有数据呢？

Specifically I want column C to be equal to column A multiplied by the value in a dictionary d , which has the value of column B as its key; 具体来说，我希望C列等于A列乘以字典d的值，字典d以B列的值为键； ie C = A*d[B] . 即C = A*d[B] 。

Currently I have this code: 目前，我有以下代码：

import pandas
d = {2: 2, 4: 3}
TextParser = pandas.read_csv('input.csv', sep=' ', iterator=True, comment='#')
df = pandas.concat([chunk[chunk['A'] > 2] for chunk in TextParser])
print(df)

Which prints this output: 哪个打印此输出：

   A  B
1  3  4

How do I get it to print this output ( C = A*d[B] ): 如何获取它以打印此输出（ C = A*d[B] ）：

   A  B  C
1  3  4  9

Answer 1

You can use a generator to work on the chunks one at a time: 您可以使用生成器一次处理一个块：

Code: 码：

def on_the_fly(the_csv):
    d = {2: 2, 4: 3}
    chunked_csv = pd.read_csv(
        the_csv, sep='\s+', iterator=True, comment='#')

    for chunk in chunked_csv:
        rows_idx = chunk['A'] > 2
        chunk.loc[rows_idx, 'C'] = chunk[rows_idx].apply(
            lambda x: x.A * d[x.B], axis=1)
        yield chunk[rows_idx]

Test Code: 测试代码：

from io import StringIO
data = StringIO(u"""#
    A   B
    1   2
    3   4
    4   4
""")

import pandas as pd
df = pd.concat([c for c in on_the_fly(data)])
print(df)

Results: 结果：

   A  B     C
1  3  4   9.0
2  4  4  12.0

在迭代csv文件的行时，将计算所得的列动态添加到pandas数据框？

问题描述

1 个解决方案

解决方案1
2 2017-03-23 04:53:15

在迭代csv文件的行时，将计算所得的列动态添加到pandas数据框？

问题描述

1 个解决方案

解决方案1 2 2017-03-23 04:53:15

解决方案1
2 2017-03-23 04:53:15