简体   繁体   English

如何使用python将包含特定单词的整行excel(.csv)复制到另一个csv文件中?

[英]How to copy entire row of excel (.csv) which contain specific words into another csv file using python?

I have to copy all the rows which contain specific word into an anther csv file.我必须将包含特定单词的所有行复制到花药csv文件中。

My file is in .csv and I want to copy all rows which contain the word "Canada" in one of the cells.我的文件在.csv ,我想复制其中一个单元格中包含“加拿大”一词的所有行。 I have tried the various method given on the internet.我已经尝试了互联网上给出的各种方法。 But I am unable to copy my rows.但我无法复制我的行。 My data contains more than 15,000 lines.我的数据包含超过 15,000 行。

Example of my dataset includes:我的数据集示例包括:

tweets         date           area  
dbcjhbc    12:4:19         us 
cbhjc      3:3:18          germany
cwecewc    5:6:19          canada
cwec       23:4:19          us
wncwjwk     9:8:18         canada

code is:代码是:

import csv

with open('twitter-1.csv', "r" ,encoding="utf8") as f:
    reader = csv.DictReader(f, delimiter=',')
    with open('output.csv', "w") as f_out:
        writer = csv.DictWriter(f_out, fieldnames=reader.fieldnames, delimiter=",")
        writer.writeheader()
        for row in reader:
            if row == 'Canada':
                writer.writerow(row)

But this code is not working and I am getting the error但是这段代码不起作用,我收到了错误

Error: field larger than field limit (131072)错误:字段大于字段限制 (131072)

I know the question is asking for a solution in Python, but I believe this task can be solved easier with command-line tools.我知道问题是在 Python 中寻求解决方案,但我相信使用命令行工具可以更轻松地解决此任务。

One-Liner using Bash:使用 Bash 的单线:

grep 'canada' myFile.csv > outputfile.csv

You can do this even without the csv module.即使没有 csv 模块,您也可以做到这一点。

# read file and split by newlines (get list of rows)
with open('input.csv', 'r') as f:
    rows = f.read().split('\n')

# loop over rows and append to list if they contain 'canada'
rows_containing_keyword = [row for row in rows if 'canada' in row]

# create and write lines to output file
with open('output.csv', 'w+') as f:
    f.write('\n'.join(rows_containing_keyword))

Assuming your .csv data ( twitter-1.csv ) looks like this:假设您的 .csv 数据 ( twitter-1.csv ) 如下所示:

tweets,date,area
dbcjhbc,12:4:19,us 
cbhjc,3:3:18,germany
cwecewc,5:6:19,canada
cwec,23:4:19,us
wncwjwk,9:8:18,canada

Using numpy:使用 numpy:

import numpy as np

# import .csv data (skipping header)
data = np.genfromtxt('twitter-1.csv', delimiter=',', dtype='string', skip_header=1)

# select only rows where the 'area' column is 'canada'
data_canada = data[np.where(data[:,2]=='canada')]

# export the resulting data
np.savetxt("foo.csv", data_canada, delimiter=',', fmt='%s')

foo.csv will contain: foo.csv将包含:

cwecewc,5:6:19,canada
wncwjwk,9:8:18,canada

If you want to search every entry (every column) for canada , then you could use list comprehension.如果您想搜索canada每个条目(每列),那么您可以使用列表理解。 Assume twitter-1.csv contained an occurrence of canada in the tweets column:假设twitter-1.csvtweets列中包含一个canada的出现:

tweets,date,area
dbcjhbc,12:4:19,us 
cbhjc,3:3:18,germany
cwecewc,5:6:19,canada
canada,23:4:19,us
wncwjwk,9:8:18,canada

This will return all rows with any occurrence of canada :这将返回任何出现canada所有行:

out = [i for i, v in enumerate(data) if 'canada' in v]
data_canada = data[out]
np.savetxt("foo.csv", data_canada, delimiter=',', fmt='%s')

Now, foo.csv will contain:现在, foo.csv将包含:

cwecewc,5:6:19,canada
canada,23:4:19,us
wncwjwk,9:8:18,canada

All solutions except the grep one (which is probably the fastest if grep is available) load the entire .csv file into memory.除了grep之外的所有解决方案(如果grep可用,这可能是最快的)将整个 .csv 文件加载到内存中。 Don't do that!不要那样做! You can stream the file and keep only one line in memory at a time.您可以流式传输文件并一次仅在内存中保留一行。

with open('input.csv', 'r') as if, open('output.csv', 'w') as of:
    for line in if:
        if 'canada' in line:
            of.write(line)

NOTE: I don't actually have python3 on this computer, so there might be a typo on this code.注意:我实际上在这台计算机上没有 python3,所以这段代码可能有错字。 But I'm confident it's more efficient on sufficiently large files than loading the entire file into memory before manipulating it.但是我相信它在足够大的文件上比在操作之前将整个文件加载到内存中更有效。 It would be interesting to see benchmarks.看到基准测试会很有趣。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 %BASH 将包含特定字符串的行从 a.csv 文件复制到 Python 中的 a.txt 文件? - How do you copy lines that contain specific strings from a .csv file to a .txt file in Python using %BASH? 如何使用python将.csv文件中的一行复制到另一行 - How to copy one row in a .csv file to another row with python 从csv文件中提取特定的列,然后使用python将其复制到另一个 - to extract specific columns from a csv file and copy it to another using python 如何使用 python 读取文本文件并将特定单词保存到 csv 或另一个文本文件中 - How to read text file and save specific words into csv or another text file using python Python 3,将特定列从一个csv文件复制到另一个csv - Python 3, Copy specific column from one csv file to another csv 如何替换整个csv文件中的特定单词? - How to replace specific words from entire csv file? 如何使用python将某些csv文件列复制到另一个csv文件中? - how to copy some csv file colums into another csv file with python? 如何使用 python 中的 iterrows 从特定行读取 csv 文件? - How to read csv file from a specific row using iterrows in python? 使用 Python 的 CSV 模块覆盖 csv 文件中的特定行 - Overwriting a specific row in a csv file using Python's CSV module 如何在 csv 文件中搜索特定字段然后在 Python 中打印整行 - How do I search for a specific field in a csv file then print the entire row in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM