简体   繁体   English

使用Python在Excel(.xlsx)中查找和替换字符串

[英]Find and replace strings in Excel (.xlsx) using Python

I am trying to replace a bunch of strings in an .xlsx sheet (~70k rows, 38 columns). 我正在尝试替换.xlsx工作表(〜70k行,38列)中的一堆字符串。 I have a list of the strings to be searched and replaced in a file, formatted as below:- 我有一个要搜索并替换为文件的字符串列表,格式如下:-

bird produk - bird product
pig - pork
ayam - chicken
...
kuda - horse

The word to be searched is on the left, and the replacement is on the right (find 'bird produk', replace with 'bird product'. My .xlsx sheet looks something like this:- 要搜索的词在左侧,而替换词在右侧(找到“ bird produk”,替换为“ bird product”。我的.xlsx工作表如下所示:-

name     type of animal     ID
ali      pig                3483
abu      kuda               3940
ahmad    bird produk        0399
...
ahchong  pig                2311

I am looking for the fastest solution for this, since I have around 200 words in the list to be searched, and the .xlsx file is quite large. 我正在寻找最快的解决方案,因为列表中大约有200个单词要搜索,并且.xlsx文件很大。 I need to use Python for this, but I am open to any other faster solutions. 我需要为此使用Python,但我对其他更快的解决方案持开放态度。

Edit:- added sheet example 编辑:-添加工作表示例

Edit2:- tried some python codes to read the cells, took quite a long time to read. Edit2:-尝试了一些python代码来读取单元格,花费了相当长的时间。 Any pointers? 有指针吗?

from xlrd import open_workbook
wb = open_workbook('test.xlsx')

for s in wb.sheets():
    print ('Sheet:',s.name)
    for row in range(s.nrows):
        values = []
        for col in range(s.ncols):
            print(s.cell(row,col).value)

Thank you! 谢谢!

Edit3:- I finally figured it out. Edit3:-我终于想通了。 Both VBA module and Python codes work. VBA模块和Python代码都可以使用。 I resorted to .csv instead to make things easier. 我改用.csv来简化工作。 Thank you! 谢谢! Here is my version of the Python code:- 这是我的Python代码版本:-

import csv

###### our dictionary with our key:values. ######
reps = {
    'JUALAN (PRODUK SHJ)' : 'SALE( PRODUCT)',
    'PAMERAN' : 'EXHIBITION',
    'PEMBIAKAN' : 'BREEDING',
    'UNGGAS' : 'POULTRY'}


def replace_all(text, dic):
    for i, j in reps.items():
        text = text.replace(i, j)
    return text

with open('test.csv','r') as f:
    text=f.read()
    text=replace_all(text,reps)

with open('file2.csv','w') as w:
    w.write(text)

I would copy the contents of your text file into a new worksheet in the excel file and name that sheet "Lookup." 我会将您的文本文件的内容复制到excel文件中的新工作表中,并将该工作表命名为“ Lookup”。 Then use text to columns to get the data in the first two columns of this new sheet starting in the first row. 然后使用文本列,以获取该新表的前两列中从第一行开始的数据。

Paste the following code into a module in Excel and run it: 将以下代码粘贴到Excel中的模块中并运行它:

Sub Replacer()
    Dim w1 As Worksheet
    Dim w2 As Worksheet

    'The sheet with the words from the text file:
    Set w1 = ThisWorkbook.Sheets("Lookup")
    'The sheet with all of the data:
    Set w2 = ThisWorkbook.Sheets("Data")

    For i = 1 To w1.Range("A1").CurrentRegion.Rows.Count
        w2.Cells.Replace What:=w1.Cells(i, 1), Replacement:=w1.Cells(i, 2), LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    Next i

End Sub

Make 2 arrays A[bird produk, pig, ayam, kuda] //words to be changed B[bird product, pork, chicken, horse] //result after changing the word 制作2个数组A [bird produk,pig,ayam,kuda] //要更改的单词B [bird product,猪肉,鸡肉,马] //更改单词后的结果

Now check each row of your excel and compare it with every element of A. If i matches then replace it with corresponding element of B. 现在检查excel的每一行,并将其与A的每个元素进行比较。如果我匹配,则将其替换为B的相应元素。

for example // not actual code something like pseudocode 例如//不是实际的代码,例如伪代码

for (i=1 to no. of rows.)
{
for(j=1 to 200)
{
if(contents of row[i] == A[j])
then contents of row[i]=B[j] ;
break;
}
}

To make it fast you have to stop the current iteration as soon as the word is replaced and check the next row. 为了快速,您必须在替换单词后立即停止当前迭代,然后检查下一行。

与@coder_A的想法类似,但是使用字典为您完成“翻译”,其中键是原始单词,每个键的值就是将其转换为的值。

For reading and writing xls with Python, use xlrd and xlwt, see http://www.python-excel.org/ 要使用Python读取和编写xls,请使用xlrd和xlwt,请参见http://www.python-excel.org/

A simple xlrd example: 一个简单的xlrd示例:

from xlrd import open_workbook
wb = open_workbook('simple.xls')

for s in wb.sheets():
    print 'Sheet:',s.name
    for row in range(s.nrows):
        values = []
        for col in range(s.ncols):
            print(s.cell(row,col).value)

and for replacing target text, use a dict 而要替换目标文本,请使用dict

replace = {
    'bird produk': 'bird product',
    'pig': 'pork',
    'ayam': 'chicken'
    ...
    'kuda': 'horse'
}

Dict will give you O(1) (most of the time, if keys don't collide) time complexity when checking membership using 'text' in replace . 'text' in replace使用'text' in replace检查成员资格时,Dict将为您提供O(1) (大多数情况下,如果键不冲突)的时间复杂度。 there's no way to get better performance than that. 没有比这更好的性能了。

Since I don't know what your bunch of strings look like, this answer may be inaccurate or incomplete. 由于我不知道您的bunch of strings什么样的,因此此答案可能不准确或不完整。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM