简体   繁体   English

将函数应用于文件中每一行的特定表达式

[英]Apply a function to a specific expression for every line in a file

I currently am reading the contents of a file to a new file for every case instance where the lines fit a specific criteria.我目前正在将文件的内容读取到每个行符合特定条件的案例实例的新文件中。 See the code below看下面的代码

from string import punctuation

fpath = open('Redshift_twb_1.txt', 'r')
lines = fpath.readlines()

fpath_write = open('Redshift_1_new.txt', 'w+')

# filter the list; with the string 'apple'
# replace 'apple' with whatever string you want to find
temp_out_lines = [line for line in lines if '<column caption' in line]
out_lines = [line for line in temp_out_lines if 'param-domain-type' not in line]

# Lambda function that maps .lower() function to every element of the list out_lines
lower_lines = map(lambda x:x.lower(), out_lines)

# Join the lines into a single string
output = '\n'.join(lower_lines)

# write it
fpath_write.write(output)

fpath.close()
fpath_write.close()

My goal is to implement functionality that can read take a line and downcase or lowercase a specific parameter before that line is then written to the new file.我的目标是实现可以在将该行写入新文件之前读取一行并小写或小写特定参数的功能。

Currently, the process takes in a line, checks if it matches <column caption , then checks if it does not contain param-domain-type .目前,该过程接受一行,检查它是否匹配<column caption ,然后检查它是否不包含param-domain-type and if both of those pass, the line is then added to the new txt file.如果这两个都通过,则该行将添加到新的 txt 文件中。

An example line is below:示例行如下:

<column caption='Section' datatype='string' name='[SECTION]' role='dimension' type='nominal'>

The goal is to check every line before it is added to the new txt file, and for every instance of name='[****]' , make the value within the [] lowercase.目标是在将每一行添加到新的 txt 文件之前检查每一行,并且对于name='[****]'每个实例,将[]的值设为小写。 currently, they are upper case.目前,它们是大写的。

Note: only the value within the [] 's for the param name= can be lowercased.注意:只有[]中参数name=值可以小写。 there are other params in the line that must stay capitalized.该行中还有其他参数必须保持大写。

Thanks!谢谢!

Edit: Another option would be to do a make shift find and replace that would find all instances with name='[ABC]' , and replace it with name='[abc]' .编辑:另一种选择是进行临时查找和替换,以找到所有具有name='[ABC]'实例,并将其替换为name='[abc]' But still, I do not know how to go about this on my own.但是,我仍然不知道如何自己解决这个问题。

Edit2: Upon implementing Regex, I have also used a for loop to loop through every instance of the txt file... see below code. Edit2:在实现 Regex 时,我还使用了 for 循环来循环遍历 txt 文件的每个实例...请参阅下面的代码。

for x in range(len(out_lines)):
    print(out_lines[x])
    test = str(out_lines[x])
    out_lines[x] = re.sub(r"(name='([.*?])')", lambda m: m.group(1).lower(), test)
    print(out_lines[x])

However when I do so I still get the same output:但是,当我这样做时,我仍然得到相同的输出:

<column caption='Location' datatype='string' name='[MANAGEMENT_LOCATION]' role='dimension' type='nominal' />

<column caption='Location' datatype='string' name='[MANAGEMENT_LOCATION]' role='dimension' type='nominal' />

you can use re python module to replace necessary substring.您可以使用 re python 模块来替换必要的子字符串。

import re
re.sub(r"(name='(\[.*?\])')", lambda m: m.group(1).lower(), <YOUR TEXT>)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM