简体   繁体   English

每当范围更改时,将范围的每组的所有行都写入新文件Python 3.6

[英]Write all lines for each set of a range to new file each time the range changes Python 3.6

trying to find a way of making this process work pythonically or at all. 试图找到一种方法来使该过程以Python的方式或完全不起作用。 Basically, I have a really long text file that is split into lines. 基本上,我有一个很长的文本文件,该文件分为几行。 Every x number of lines there is one that is mainly uppercase, which should roughly be the title of that particular section. 每x个行都有一个主要是大写的行,大约应该是该特定节的标题。 Ideally, I'd want the title and everything after to go into a text file using the title as the name for the file. 理想情况下,我希望标题和之后的所有内容都进入文本文件,并使用标题作为文件名。 This would have to happen 3039 in this case as that is as many titles will be there. 在这种情况下,这将必须发生3039,因为那里将有许多标题。 My process so far is this: I created a variable that reads through a text file tells me if it's mostly uppercase. 到目前为止,我的过程是这样的:我创建了一个变量,该变量会读取文本文件,告诉我它是否大部分是大写字母。

def mostly_uppercase(text):
    threshold = 0.7
    isupper_bools = [character.isupper() for character in text]
    isupper_ints = [int(val) for val in isupper_bools]
    try:
        upper_percentage = np.mean(isupper_ints)
    except:
        return False
    if upper_percentage >= threshold:
        return True
    else:
        return False

Afterwards, I made a counter so that I could create an index and then I combined it: 之后,我做了一个计数器,以便创建索引,然后将其合并:

counter = 0

headline_indices = []

for line in page_text:
    if mostly_uppercase(line):
        print(line)
        headline_indices.append(counter)
    counter+=1

headlines_with_articles = []
headline_indices_expanded = [0] + headline_indices + [len(page_text)-1]

for first, second in list(zip(headline_indices_expanded, headline_indices_expanded[1:])):
    article_text = (page_text[first:second])
    headlines_with_articles.append(article_text)

All of that seems to be working fine as far as I can tell. 据我所知,所有这些似乎都工作正常。 But when I try to print the pieces that I want to files, all I manage to do is print the entire text into all of the txt files. 但是,当我尝试打印要归档的文件时,我要做的就是将整个文本打印到所有txt文件中。

for i in range(100):
    out_pathname = '/sharedfolder/temp_directory/' + 'new_file_' + str(i) + '.txt'
    with open(out_pathname, 'w') as fo:
        fo.write(articles_filtered[2])

Edit: This got me halfway there. 编辑:这让我中途了。 Now, I just need a way of naming each file with the first line. 现在,我只需要一种用第一行命名每个文件的方法。

for i,text in enumerate(articles_filtered):
    open('/sharedfolder/temp_directory' + str(i + 1) + '.txt', 'w').write(str(text))

One conventional way of processing a single input file involves using a Python with statement and a for loop, in the following way. 处理单个输入文件的一种传统方式涉及以下列方式使用带with语句的Python和for循环。 I have also adapted a good answer from someone else for counting uppercase characters, to get the fraction you need. 我还从其他人那里得到了一个很好的答案,用于计算大写字符,以获得所需的分数。

def mostly_upper(text):
    threshold = 0.7
    ## adapted from https://stackoverflow.com/a/18129868/131187
    upper_count = sum(1 for c in text if c.isupper())
    return upper_count/len(text) >= threshold

first = True
out_file = None
with open('some_uppers.txt') as some_uppers:
    for line in some_uppers:
        line = line.rstrip()
        if first or mostly_upper(line):
            first = False
            if out_file: out_file.close()
            out_file = open(line+'.txt', 'w')
        print(line, file=out_file)
out_file.close()

In the loop, we read each line, asking whether it's mostly uppercase. 在循环中,我们读取每一行,并询问是否大部分都是大写的。 If it is we close the file that was being used for the previous collection of lines and open a new file for the next collection, using the contents of the current line as a title. 如果是这样,则以当前行的内容为标题,关闭用于上一个行集合的文件,并为下一个集合打开一个新文件。

I allow for the possibility that the first line might not be a title. 我允许第一行可能不是标题。 In this case the code creates a file with the contents of the first line as its names, and proceeds to write everything it finds to that file until it does find a title line. 在这种情况下,代码创建的第一行作为其名称的内容的文件,并继续书写它找到该文件,直到一切找到一个标题行。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python不会每次将新行写入文本文件 - Python won't write each time new lines into text file Python 将一系列数字之间的文本行写入新文件 - Python Write lines of a text in between a range of numbers to a new file Python 3+,读入文本文件并写入新文件(不包括行范围) - Python 3+, Read In Text File and Write to New File Excluding Range of Lines Python 请求:从一个 TXT 文件中获取所有行,一次获取一个请求并将它们保存到一个新的 TXT 文件中 - Python Requests: take all lines from a TXT file, one at a time to get requests from each and save them to a new TXT file 在Python中创建一个函数,该函数在一定范围内运行,每次都向数组返回一个新值 - Creating a function in Python which runs over a range and returns a new value to an array each time Python:查找范围内的所有数字集,其中每个集合包含x距离且不超出范围的值 - Python: find all sets of numbers inside a range where each set contains values that are x distance apart and don't exceed the range 在 Python 中的日期范围内为每年创建新行? - Creating new row for each year in a date range in Python? 每次在python中创建一个新的日志文件 - Create a new log file each time in python 每次python运行时新的文件名 - new file name each time python run Python:保存一个新文件,删除给定时间范围之外的所有数据 - Python: saving a new file deleting all data outside a given range of time
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM