如何使用python将包含特定单词的整行excel（.csv）复制到另一个csv文件中？

Question

I have to copy all the rows which contain specific word into an anther csv file.我必须将包含特定单词的所有行复制到花药csv文件中。

My file is in .csv and I want to copy all rows which contain the word "Canada" in one of the cells.我的文件在.csv ，我想复制其中一个单元格中包含“加拿大”一词的所有行。 I have tried the various method given on the internet.我已经尝试了互联网上给出的各种方法。 But I am unable to copy my rows.但我无法复制我的行。 My data contains more than 15,000 lines.我的数据包含超过 15,000 行。

Example of my dataset includes:我的数据集示例包括：

tweets         date           area  
dbcjhbc    12:4:19         us 
cbhjc      3:3:18          germany
cwecewc    5:6:19          canada
cwec       23:4:19          us
wncwjwk     9:8:18         canada

code is:代码是：

import csv

with open('twitter-1.csv', "r" ,encoding="utf8") as f:
    reader = csv.DictReader(f, delimiter=',')
    with open('output.csv', "w") as f_out:
        writer = csv.DictWriter(f_out, fieldnames=reader.fieldnames, delimiter=",")
        writer.writeheader()
        for row in reader:
            if row == 'Canada':
                writer.writerow(row)

But this code is not working and I am getting the error但是这段代码不起作用，我收到了错误

Error: field larger than field limit (131072)错误：字段大于字段限制 (131072)

Answer 1

I know the question is asking for a solution in Python, but I believe this task can be solved easier with command-line tools.我知道问题是在 Python 中寻求解决方案，但我相信使用命令行工具可以更轻松地解决此任务。

One-Liner using Bash:使用 Bash 的单线：

grep 'canada' myFile.csv > outputfile.csv

Answer 2

You can do this even without the csv module.即使没有 csv 模块，您也可以做到这一点。

# read file and split by newlines (get list of rows)
with open('input.csv', 'r') as f:
    rows = f.read().split('\n')

# loop over rows and append to list if they contain 'canada'
rows_containing_keyword = [row for row in rows if 'canada' in row]

# create and write lines to output file
with open('output.csv', 'w+') as f:
    f.write('\n'.join(rows_containing_keyword))

Answer 3

Assuming your .csv data ( twitter-1.csv ) looks like this:假设您的 .csv 数据 ( twitter-1.csv ) 如下所示：

tweets,date,area
dbcjhbc,12:4:19,us 
cbhjc,3:3:18,germany
cwecewc,5:6:19,canada
cwec,23:4:19,us
wncwjwk,9:8:18,canada

Using numpy:使用 numpy：

import numpy as np

# import .csv data (skipping header)
data = np.genfromtxt('twitter-1.csv', delimiter=',', dtype='string', skip_header=1)

# select only rows where the 'area' column is 'canada'
data_canada = data[np.where(data[:,2]=='canada')]

# export the resulting data
np.savetxt("foo.csv", data_canada, delimiter=',', fmt='%s')

foo.csv will contain: foo.csv将包含：

cwecewc,5:6:19,canada
wncwjwk,9:8:18,canada

If you want to search every entry (every column) for canada , then you could use list comprehension.如果您想搜索canada每个条目（每列），那么您可以使用列表理解。 Assume twitter-1.csv contained an occurrence of canada in the tweets column:假设twitter-1.csv在tweets列中包含一个canada的出现：

tweets,date,area
dbcjhbc,12:4:19,us 
cbhjc,3:3:18,germany
cwecewc,5:6:19,canada
canada,23:4:19,us
wncwjwk,9:8:18,canada

This will return all rows with any occurrence of canada :这将返回任何出现canada所有行：

out = [i for i, v in enumerate(data) if 'canada' in v]
data_canada = data[out]
np.savetxt("foo.csv", data_canada, delimiter=',', fmt='%s')

Now, foo.csv will contain:现在， foo.csv将包含：

cwecewc,5:6:19,canada
canada,23:4:19,us
wncwjwk,9:8:18,canada

Answer 4

All solutions except the grep one (which is probably the fastest if grep is available) load the entire .csv file into memory.除了grep之外的所有解决方案（如果grep可用，这可能是最快的）将整个 .csv 文件加载到内存中。 Don't do that!不要那样做！ You can stream the file and keep only one line in memory at a time.您可以流式传输文件并一次仅在内存中保留一行。

with open('input.csv', 'r') as if, open('output.csv', 'w') as of:
    for line in if:
        if 'canada' in line:
            of.write(line)

NOTE: I don't actually have python3 on this computer, so there might be a typo on this code.注意：我实际上在这台计算机上没有 python3，所以这段代码可能有错字。 But I'm confident it's more efficient on sufficiently large files than loading the entire file into memory before manipulating it.但是我相信它在足够大的文件上比在操作之前将整个文件加载到内存中更有效。 It would be interesting to see benchmarks.看到基准测试会很有趣。

如何使用python将包含特定单词的整行excel（.csv）复制到另一个csv文件中？

问题描述

4 个解决方案

解决方案1
1 2019-07-16 00:03:07

解决方案2
0 2019-07-15 23:53:48

解决方案3
0 2019-07-16 00:00:14

解决方案4
0 2019-07-16 00:20:05

如何使用python将包含特定单词的整行excel（.csv）复制到另一个csv文件中？

问题描述

4 个解决方案

解决方案1 1 2019-07-16 00:03:07

解决方案2 0 2019-07-15 23:53:48

解决方案3 0 2019-07-16 00:00:14

解决方案4 0 2019-07-16 00:20:05

解决方案1
1 2019-07-16 00:03:07

解决方案2
0 2019-07-15 23:53:48

解决方案3
0 2019-07-16 00:00:14

解决方案4
0 2019-07-16 00:20:05