简体   繁体   English

使用python从文件中提取数据并写入新文件

[英]Extract data from file with python and write new file

I'm trying to extract data from a file with this structure 我正在尝试从具有这种结构的文件中提取数据

  //Side Menu market: 'Market', store: 'Store', stores: 'Stores', myNotes: 'My Notes', logout: 'Logout', //Toast activeUserHasChanged: 'Resetting app - the active user has changed.', loginHasExpired: 'Your login has expired.', appIsReseting: 'The app is resetting.', 

what I want is to extract the all the text that is between single quotation marks and put it in a new file, I think Python could be a good option but I new to programming and Python, I tried something but no luck and for what I've read it shouldn't be a small script. 我想要的是提取单引号之间的所有文本并将其放在一个新文件中,我认为Python可能是一个不错的选择,但我是编程和Python的新手,我尝试了一些尝试但没有运气,为此已经读过它应该不是一个小脚本。

My expected output is: 我的预期输出是:

  Market, Store, Stores, My Notes, Logout, Resetting app - the active user has changed, Your login has expired, The app is resetting, 

So any help on this will be appreciated. 因此,对此的任何帮助将不胜感激。

Regards. 问候。

A simple solution is something like: 一个简单的解决方案是这样的:

in_string = False
with open('infile.txt','r') as fr, open('outfile.txt','w') as fw:
  for char in fr.read():
    if char == "'":
      in_string = in_string != True  # XOR
    elif in_string:
      fw.write(char)

The intuition is that we read the file character-by-character and keep track of any ' we see along the way. 直观上,我们读取文件字符一个字符并保持跟踪任何'我们沿途看到的。 When we encounter the first, we write the next characters to the output file until we encounter the second, etc. 当遇到第一个字符时,我们将下一个字符写入输出文件,直到遇到第二个字符,依此类推。

It does not handle invalid input, and doesn't do buffering or anything fancy. 它不处理无效的输入,也不进行缓冲或其他任何花哨的操作。 But if you just have small files, which are well-formed this is should do it. 但是,如果您只有小文件,且格式正确,则应该这样做。 It also doesn't format your output in lines with commas, but that shouldn't be too hard to do from here. 它也不会以逗号分隔输出的格式,但是从这里开始应该不难。

Assuming you have input as a text file 假设您已输入为文本文件

import re
fid = open('your input file','rb')
output = open('output file','wb')
for i in fid:
    m = re.match(r"['\"](.*?)['\"]",i)
    if m is not None:
        output.write(m.group(1)+'\r\n')
fid.close()
output.close()

r"'\\"['\\"]" this regex will let you find anything between single quotation. r“'\\” ['\\“]”此正则表达式可让您在单引号之间找到任何内容。 If nothing found, then skip. 如果未找到任何内容,请跳过。 Hope this is helpful. 希望这会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM