简体   繁体   English

如何使用子字符串python查找特定字符串

[英]how to find specific string with a substring python

I have similar problem to this guy: find position of a substring in a string 我有与此人类似的问题: 在字符串中查找子字符串的位置

The difference is that I don't know what my "mystr" is. 区别在于我不知道我的“ mystr”是什么。 I know my substring but my string in the input file could be random amount of words in any order, but i know one of those words include substring cola. 我知道我的子字符串,但输入文件中的字符串可以是任意数量的随机单词,但是我知道其中一个单词包含子字符串cola。

For example a csv file: fanta,coca_cola,sprite in any order. 例如一个csv文件: fanta,coca_cola,sprite (任意顺序)。

If my substring is "cola", then how can I make a code that says 如果我的子字符串是“可乐”,那么我该如何编写一个如下代码:

mystr.find('cola')

or 要么

match = re.search(r"[^a-zA-Z](cola)[^a-zA-Z]", mystr)

or 要么

if "cola" in mystr

When I don't know what my "mystr" is? 当我不知道我的“ mystr”是什么时?

this is my code: 这是我的代码:

import csv

with open('first.csv', 'rb') as fp_in, open('second.csv', 'wb') as fp_out:
        reader = csv.DictReader(fp_in)
        rows = [row for row in reader]
        writer = csv.writer(fp_out, delimiter = ',')

        writer.writerow(["new_cola"])

        def headers1(name):
            if "cola" in name:
                    return row.get("cola")


        for row in rows:
                writer.writerow([headers1("cola")])

and the first.csv: 和first.csv:

fanta,cocacola,banana
0,1,0
1,2,1                      

so it prints out 所以它打印出来

new_cola
""
""

when it should print out 什么时候应该打印出来

new_cola
1
2

Here is a working example: 这是一个工作示例:

import csv

with open("first.csv", "rb") as fp_in, open("second.csv", "wb") as fp_out:
        reader = csv.DictReader(fp_in)
        writer = csv.writer(fp_out, delimiter = ",")

        writer.writerow(["new_cola"])

        def filter_cola(row):
            for k,v in row.iteritems():
                if "cola" in k:
                    yield v

        for row in reader:
            writer.writerow(list(filter_cola(row)))

Notes: 笔记:

  • rows = [row for row in reader] is unnecessary and inefficient (here you convert a generator to list which consumes a lot of memory for huge data) rows = [row for row in reader]是不必要且效率低下的(在这里,您将生成器转换为消耗大量内存来存储大量数据的列表)
  • instead of return row.get("cola") you meant return row.get(name) 而不是return row.get("cola")意思是return row.get(name)
  • in the statement return row.get("cola") you access a variable outside of the current scope 在语句return row.get("cola")您可以访问当前作用域之外的变量
  • you can also use the unix tool cut . 您还可以使用unix工具剪切 For example: 例如:

     cut -d "," -f 2 < first.csv > second.csv 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM