![](/img/trans.png)
[英]Each row in DataFrame column is a list. How to remove leading whitespace from second to end entries
[英]python remove all whitespace from entries in a list
在.srt文件上調用readlines()
時,我得到了一個包含大量前導空格和尾隨空格的字符列表,如下所示
with open(infile) as f:
r=f.readlines()
return r
我得到了這份清單
['\xef\xbb\xbf1\r\n', '00:00:00,000 --> 00:00:03,000\r\n', "[D. Evans] Now that you've written your first Python program,\r\n",'\r\n', '2\r\n', '00:00:03,000 --> 00:00:06,000\r\n', 'you might be wondering why we need to invent new languages like Python\r\n', '\r\n']
為簡潔起見,我只包含了一些元素。如何清理此列表,以便刪除所有空白字符並僅獲取相關元素
['1','00:00:00,000 --> 00:00:03,000',"[D. Evans] Now that you've written your first Python program"...]
你可以去除每一行。 如果你正在處理一個大文件,那么將它作為生成器運行也可以節省一些內存。
此外,看起來你正在處理帶有BOM的UTF-8文件(對於前幾個字符來說有點愚蠢,或者至少是不必要的),所以你需要以不同的方式打開它。
import codecs
def strip_it_good(file):
with codecs.open(file, "r", "utf-8-sig") as f:
for line in f:
yield line.strip()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.