簡體   English   中英

如何在文本文件中每隔1000個單詞添加\\ n換行符?

[英]How could I add a \n Newline character after every 1000 words in a text file?

好了,這就是問題。 我有一些文本文件,其中包含14,000多個單詞,但它們都排成一行,如果您使用的編輯器沒有自動換行功能,則無法讀取該文本文件。 因此,我想在至少1000個單詞之后以及下次出現"."向文件添加returnsnewline "." 我的第一個想法是對行進行計數,然后將其加起來,並在達到1000時插入一個\\n字符,但所有字符都在一行上。 這使事情變得更加困難,而且我一直無法找到一種方法來實現自己想要的目標。 如果沒有我,我本人將自己遍歷文本文件並添加換行符。 這違背了我只運行python腳本自動為我完成目標的目標。 這可能嗎? 還是我為這樣的想法而瘋狂? 預先感謝您提供的任何幫助! 我在下面提供了各種嘗試來做到這一點。

在這種嘗試中,代碼可以按預期工作,但是打印出的Word Count is over 1000大約14倍而不是它。 因為,此文本文件的字數是14,000左右。 它只打印一次,因為只有一行可以讀取。

text_file = "textfile.txt"
numLines = 0
numWords = 0
numChars = 0

with open(text_file, 'r') as file:
for line in file:
    wordsList = line.split()
    numLines +=1
    numWords += len(wordsList)
    numChars += len(line)
    if numWords > 1000:
        print("Word Count is over 1000.")

在下一次嘗試中,我沒有類似之處,但仍然得到與上述相同的結果。 \\n\\n\\n\\n不會看到它大約14次寫入文本文件,而是僅在文件末尾發生一次。

def oldWordCounter(input_file):
    word_count = 0

    with open(input_file, 'r') as f:
        for line in f:
            word_count = len(line.split(' '))
            print("Word count = %s \n" % word_count)

    if word_count > 1000: 
        with open(input_file, 'a') as f:
            f.write("\n\n\n\n")

我確定我只是缺少一些簡單的東西,但我對python還是很陌生。 即使殺了我在這里問一個問題。 我盡全力以赴,似乎再也沒有辦法。 因此,再次感謝您在此問題上可以提供的任何幫助!

同樣在下面,我提供了下一個周期發生后計划添加換行符的方式。 不知道這是否有幫助,但可能會幫助您了解我想完成的更多事情。

def splitOnPeriod(input_file):
with open(input_file,"r") as f:
    for line in f:
        searchPhrase = "."
        if searchPhrase in line:
            file = open(input_file, "a")
            file.write("\n\n\n\n")
            print("found it\n")

這是我正在處理的一小部分文字...

World headquarters, only business Google without bada bing bada boom, guess who's back inside your room. It is the Thrive time show on your radio. My name is Clay Clark, the former and recovering disc jockey. I am joined today Inside the Box rocks with with a guy. He sees he's on telling you what he's he's back in Tulsa for at least the foreseeable future, maybe maybe for several days several minutes. It'S dr. Robert zoellner, sir welcome back. I am so fired up today. I am in such a great mood and right now I could see Marshall and I could see his reaction as I get to announce why I'm so happy all really. Yes, I glorious thing happen this weekend. You'Re discovering more hair is growing and I like you're, going with that by the way this happened to do with a little support. We Americans love so much call football Hurricane football. Absolutely I mean the world. I have waited a year to get the world right again and in my Oklahoma, Sooners go up to Columbus and whoop. I mean now. Let'S talk about the facts here, cuz there's a lot of people listening. This is a business, show its business school without the BS to keep it relevant to make sure that understand this Oklahoma. If I'm correct was right, number 5 correct and I believe that Ohio state was ranked number 2. Yes, why you leave in the box of rocks? Do is In-N-Out Marshall to the drivers who don't know Marshall, for business coaches in Ohio from Ohio and he's not so he really cares about Ohio. Yes, fifth-ranked Boomer Sooners went up there and beat him was a close. Now. It wasn't even close, really really good, and so then I'm so that was Saturday and then Sunday this last weekend and I've been waiting to have Marshall in the Box, because I can't make this announcement without you really here to sit on that till Wednesday. Clear the clear that kind of thing I didn't seem last couple things on Sunday, the Dallas Cowboys won the double bonus. Can I will quick on this and I've loved the Patriots and Jonathan are off as he hates the Patriots, and so whenever his Giants lose, I almost feel better about their loss. I almost feel better about their loss, then actual win for the Patriots and when I saw the Cowboys just turn it on I'm like this is great. I don't care what team it is as long as they're playing the Giants. I am I'm almost. I wouldn't make a prayer chain, but I will be on the verge of making your prayer chain for your team excited to see, but I don't care who it is they beat. The Giants is a great thing for American I'm a Little Lamb lunch Wagers. I am going to whenever he pays off on The Chew very slowly and enjoy every moment of tizers have reserved, but I'll have to I'll. Have I don't normally do it, but since you're paying for it Marshall, I think I will now on Today Show we're breaking down to six books that every entrepreneur should read the six books at every entrepreneur should read, and a book number one was thinking, Grow. Rich book number to you can actually get that book for free. It is start here the book The we put together the documents, our business cyst shamelessly. So if you want to learn how to grow successful company to start here to 550 page book, it's absolutely free to download it Thrive time show. And we just hit the amazon.com best sellers list on that. So if you go to Amazon now and you type in like business Consulting into the search bar, that book actually comes up in the top five books now, and so that's a book that you can get there for free to ebook, it's absolutely free for you. We move on now to book number 3, which is Titan now. Titan is the book that documents, the Life, The Life and Times of John D Rockefeller, who actually grew up like everybody else, use Easy. You start somewhere. He grew up poor and at the age of 16 he began working to support his mother because his father was an absent father and actually decided to leave his family and get married to another woman without telling his current wife it's breaking down some notable quotables from That book and I'm going to go ahead and give you the first notable quotable. This is John D. Rockefeller Miss. Is it from the book tighten the author writes he had a great generals, ability to focus on his goals and a brush aside obstacles as Petty distractions. He wants said you can abuse me.

這段代碼每隔1000行就會拆分一次,在達到時會重置.

words = s.split()
new_text = ""
word_count = 0
for word in words:
    new_text += word + " "
    word_count += 1
    if word_count == 1000 or "." in word:
        new_text += "\n"
        word_count = 0

其中s是從文件讀取的字符串。 之后,只需將new_text寫入文件即可。

讀取所有單詞以列出並在每千個單詞或帶有句點的單詞后附加“ \\ n”以歸檔。

AllWords = []
for line in open("data_words.txt"):
    row = line.split(' ')
    AllWords+=list(row)

line_breaker=1000
i=1
with open("/home/kiran/km/km_hadoop/data/data_wordcount_op.txt", 'a') as file:
    for word in AllWords:
        if("." in word or i==line_breaker):
            file.write(word.strip('\n')+"\n")
            i=0
        else:
            file.write(word.strip('\n')+" ")

        i+=1

為了回答您的第一個問題,我定義了一個linewrapper函數,該函數需要一個文件和所需的換行長度。 使用模運算符,我們將迭代器除以wrap_length減一,因為索引從0開始。模運算符使我們能夠確定它是否可以被100整除。例如,如果wrap_length為97並且i為96,我們將得到余數的值不是0。如果沒有余數,那么該值將為0。我們需要檢查i是否為0,因為0除以0不會得到余數。 您可以在此處閱讀有關如何應用該運算符的更多信息: https : //docs.python.org/3.3/reference/expressions.html#binary-arithmetic-operations

def linewrapper(input_file, wrap_length):
    with open(input_file, 'r') as input_file, open('output.txt', 'w') as output_file:
        for line in input_file:
            words = line.split()
            for i in range(0, len(words)):
                output_file.write('%s ' % words[i])
                if i != 0 and i % (wrap_length - 1) == 0:
                    output_file.write("\n")

linewrapper('input.txt', 100)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM