简体   繁体   English

使用Python搜索并替换为CSV文件

[英]Search and replace in a CSV file using Python

My last question was considered a duplicate, but I haven't found a question remotely similar to what I am asking, so I will rephrase: 我的最后一个问题被认为是重复的,但是我还没有找到一个与我所问的问题非常相似的问题,因此我将其表述为:

I have a csv file, four columns, and about 26,000 rows. 我有一个csv文件,四列和大约26,000行。

The data is as follows for every row: 每行的数据如下:

Firstname,, Lastname,, ID,, Address 

In the last column, the address column, the addresses are formatted as follows: 在最后一列的地址栏中,地址的格式如下:

1234 Streetname Dr.
Timbuktu, AK 32456
United States

My goal is only to remove the country name, from every row that contains it (not all rows do), preserving the rest of the address, and write this back to the file. 我的目标只是从包含国家名称的每一行中删除该国家名称(并非所有行都这样做),保留其余地址,并将其写回到文件中。 I want all the other data to remain as it was. 我希望所有其他数据保持不变。 Basically: any instance of...say... the substring "United States" and replace it with a blank space. 基本上:...的任何实例都说...子字符串“ United States”,并将其替换为空格。

The code I presently have is as follows: 我目前拥有的代码如下:

import csv


with open('file.csv', 'rt') as rf:
    reader = csv.reader(rf, delimiter=',')
    for row in reader:
#print(row[3] + "\n")    # this works
        usa = "United States"
        row1 = row[0]
        row2 = row[1]
        row3 = row[2]

        if usa in row[3]:
            newrow = row[3].replace(usa, " ")
            #print(newrow + "\n")
with open('file.csv', 'w') as wf:
    writer = csv.writer(wf)    
    writer.writerows(row1 + row2 + row3 + newrow)

It is presently deleting the CSV file nearly clean. 目前正在删除几乎干净的CSV文件。 Some strange single chars are left over in a few rows, only in the first column. 仅在第一列中剩下几行奇怪的单个字符。

Can someone help point me to the snag? 有人可以帮我指出障碍吗? Thanks. 谢谢。

Try this. 尝试这个。 You will need to obtain a list of possible country names 您将需要获取可能的国家名称的列表

df = pd.read_csv('data.csv')
country_names = some_list_containing_all_country_names 
df['address'] = df['address'].apply(lambda x: x.split('\n'))
df['address'] = df['address'].apply(lambda x: "\n".join(x[:-1]) if x[-1].lower() in country_names else "\n".join(x))
df.to_csv('data.csv',index=False)

The snag is that you overwrite all your information in the first loop with the final value of row1, row2, and row3, then write the contents of that to the file. 遇到的问题是,您在第一个循环中用row1,row2和row3的最终值覆盖了所有信息,然后将其内容写入文件。 You need to bring the writing operation into the loop. 您需要将写入操作带入循环。

import csv

usa = 'United States'

with open('a.csv', 'rt') as rf:
    reader = csv.reader(rf, delimiter=',')
    with open('b.csv', 'w') as wf:
        writer = csv.writer(wf)    
        for row in reader:
            if usa in row[3]:
                row[3] = row[3].replace(usa, ' ')
            writer.writerow(row)

Edit: cleaned up slightly 编辑:稍微清理

Python is not the best tool to do this job. Python不是完成这项工作的最佳工具。 You can do this easier using shell commands: 您可以使用Shell命令更轻松地完成此操作:

Windows (Powershell): (cat myFile.csv) -replace "United States" > output.csv Windows(Powershell):( (cat myFile.csv) -replace "United States" > output.csv
Linux: sed 's/United States//' myFile.csv > output.csv Linux: sed 's/United States//' myFile.csv > output.csv

--------------------------------------------------- -------------------------------------------------- -

Edit: If you have a (long) list of countries that you want to delete: 编辑:如果您要删除的国家列表很长:

Windows(Powershell): 视窗(PowerShell的):

$countries="United States","France","Italy";
cp myFile.csv output.csv; foreach($country in $countries){(cat output.csv) -replace $country > tmp; cp tmp output.csv; rm tmp}

Linux: Linux的:

declare -a countries=("United states" "France" "Italy");
cp myFile.csv output.csv; for country in "${countries[@]}"; do sed -i "s/$country//" output.csv; done

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM