![](/img/trans.png)
[英]How can I read values from a .txt-file and save them into different variables in python and get access to them?
[英]How can I get specific columns form txt file and save them to new file using python
我有這個 txt 文件sentences.txt ,其中包含以下文本
a01-000u-s00-00 0 ok 154 19 408 746 1661 89 A|MOVE|to|stop|Mr.|Gaitskell|from
a01-000u-s00-01 0 ok 156 19 395 932 1850 105 nominating|any|more|Labour|life|Peers
其中包含 10 列,我想使用熊貓的數據框僅提取文件名(第 0 列)和相應的文本(第 10 列),不帶(|)字符我寫了這段代碼
def load() -> pd.DataFrame:
df = pd.read_csv('sentences.txt',sep=' ', header=None)
data = []
with open('sentences.txt') as infile:
for line in infile:
file_name, _, _, _, _, _, _, _, _, text = line.strip().split(' ')
data.append((file_name, cl_txt(text)))
df = pd.DataFrame(data, columns=['file_name', 'text'])
df.rename(columns={0: 'file_name', 9: 'text'}, inplace=True)
df['file_name'] = df['file_name'].apply(lambda x: x + '.jpg')
df = df[['file_name', 'text']]
return df
def cl_txt(input_text: str) -> str:
text = input_text.replace('+', '-')
text = text.replace('|', ' ')
return text
load()
我得到的錯誤
ParserError:錯誤標記數據。 C 錯誤:第 4 行應有 10 個字段,結果為 11
我預期的 process.txt 文件結果應該如下所示,沒有 \n
a01-000u-s00-00 A MOVE to stop Mr. Gaitskell from
a01-000u-s00-01 nominating any more Labour life Peers
IIUC,你只需要pandas.read_csv
來閱讀你的.txt
然后 select 兩列:
嘗試這個:
import pandas as pd
df= (
pd.read_csv("test.txt", header=None, sep=r"(\d+)\s(?=\D)", engine="python",
usecols=[0,4], names=["filename", "text"])
.assign(filename= lambda x: x["filename"].str.strip().add(".jpg"),
text= lambda x: x["text"].str.replace(r'[\|"]', " ", regex=True)
.str.replace(r"\s+", " ", regex=True))
)
print(df)
filename text
0 a01-000u-s00-00.jpg A MOVE to stop Mr. Gaitskell from
1 a01-000u-s00-01.jpg nominating any more Labour life Peers
2 a01-003-s00-01.jpg large majority of Labour M Ps are likely to
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.