简体   繁体   English

逐行读取文件,并获得以某个单词开头的行?

[英]Read a file line by line,and get the line that starts with some word?

I am using python and trying to read a file line by line and adding those lines in JSON, but i need to check if the line starts with some word and after that word put the text in json until it finds that the line starts with the specific word again, 我正在使用python并尝试逐行读取文件并将其添加到JSON中,但是我需要检查该行是否以某个单词开头,然后将该单词放入json中,直到找到该行开头为再说一个具体的字,

I have an array of these specific names: 我有这些特定名称的数组:

names_array= ['Filan Fisteku','Fisteku Filan']

so for example the txt file is like: 因此,例如txt文件如下:

  1. Filan Fisteku: Said something about this , blla blla blla then Filan Fisteku:对此说了些什么,然后blla blla blla
  2. the Filan Fisteku speech goes on on the next line, plus some other text. Filan Fisteku演讲将在下一行进行,还有其他一些文字。
  3. Fisteku Filan: This is another text from another guy which i am trying to put in a json. Fisteku Filan:这是另一个人的另一段文字,我正尝试放入json中。

so the json i want to make out of this txt is : 所以我想用这个txt制成的json是:

{
"Filan Fisteku":["Said something about this , blla blla blla",
                  "then the Filan Fisteku speech goes on on the next line,",
                  "plus some other text."],
"Fisteku Filan":["This is another text from another guy which",
                 "i am trying to put in a json"]
}

I need to know if I can do this with recursion or how can i do that? 我需要知道我是否可以通过递归来做到这一点,或者我该怎么做?

You can do this easily: 您可以轻松地做到这一点:

res = {}
with open('file.txt', 'r') as f:
    for line in f.readlines():
        for name in names_array:
            if line.startswith(name):
                if name not in res:
                    res[name] = [line]
                else:
                    res[name].append(line)

Perhaps you will also need to remove extra characters at the beginning of the line (spaces etc) but it may be not required. 也许您还需要删除行首的多余字符(空格等),但这可能不是必需的。

You can build a dict using the following: 您可以使用以下命令构建dict

names = {}
with open('yourfile') as fin:
    lines = (line.strip().partition(': ') for line in fin)
    for fst, sep, snd in lines:
        if sep: 
            name = fst
        names.setdefault(name, []).append(snd or fst)

Which gives: 这使:

{'Filan Fisteku': ['Said something about this , blla blla blla then',
                   'the Filan Fisteku speech goes on on the next line,  plus some other text.'],
 'Fisteku Filan': ['This is another text from another guy which i am trying to put in a json.']}

Then json.dumps names . 然后json.dumps names

You can use a flag to identify current speaker. 您可以使用标志来识别当前的发言者。 And update the flag if you encounter new speaker at the start of a line. 如果您在一行的开头遇到新的讲话者,请更新该标志。 And if there is no speaker at the start of the line then the line goes to the current speaker array. 如果行首没有扬声器,则该行将转到当前扬声器阵列。 I've created a demo, check if that works for you, 我已经创建了一个演示,请检查它是否适合您,

speaker = ''
Filan_Fisteku = []
Fisteku_Filan = []
with open('yourfile.txt', 'r') as f:
    for line in f.readlines():
        if line.startswith('Filan Fisteku:'):
            line = line.lstrip('Filan Fisteku:')
            Filan_Fisteku.append(line.strip())
            speaker = 'Filan Fisteku'
        elif line.startswith('Fisteku Filan:'):
            line = line.lstrip('Fisteku Filan:')
            Fisteku_Filan.append(line.strip())
            speaker = 'Fisteku Filan'
        elif speaker == 'Filan Fisteku':
            Filan_Fisteku.append(line.strip())
        elif speaker == 'Fisteku Filan':
            Fisteku_Filan.append(line.strip())
mydict = {'Filan Fisteku': Filan_Fisteku, 'Fisteku Filan': Fisteku_Filan}

Frome the data, mydict will look like this, 从数据来看, mydict会像这样,

{'Filan Fisteku': ['Said something about this , blla blla blla then',
               'the Filan Fisteku speech goes on on the next line, plus some other text.',
               'plus some other text.'],
 'Fisteku Filan': ['This is another text from another guy which',
               'i am trying to put in a json.']}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM