[英]Read text file and parse in python
I have a text file(.txt) just looks like below: 我有一个文本文件(.txt)如下所示:
Date, Day, Sect, 1, 2, 3 日期,日期,宗派,1、2、3
1, Sun, 1-1, 123, 345, 678 1,太阳,1-1,123,345,678
2, Mon, 2-2, 234, 585, 282 2,星期一,2-2,234,585,282
3, Tue, 2-2, 231, 232, 686 3,星期二,2-2,231,232,686
With this data I want to do the followings: 使用此数据,我想执行以下操作:
1) Read the text file by line as a separate element in the list 1)作为列表中的单独元素逐行读取文本文件
Split elements by comma 用逗号分割元素
Delete non-necessary elements('\\n') in the list 删除列表中不必要的元素('\\ n')
For the two, I did these. 对于这两个,我做了这些。
file = open('abc.txt', mode = 'r', encoding = 'utf-8-sig')
lines = file.readlines()
file.close()
my_dict = {}
my_list = []
for line in lines:
line = line.split(',')
line = [i.strip() for i in line]
2) Set the first row(Date, Day, Sect, 1, 2, 3) as key and set the other rows as values in the dictionary. 2)将第一行(日期,日期,日期,1、2、3)设置为键,并将其他行设置为字典中的值。
my_dict['Date'] = line[0]
my_dict['Day'] = line[1]
my_dict['Sect'] = line[2]
my_dict['1'] = line[3]
my_dict['2'] = line[4]
my_dict['3'] = line[5]
The above code has two issues: 1) Set the first row as dictionary, too. 上面的代码有两个问题:1)还将第一行设置为字典。 2) If I add this to the list as the below, it only keeps the last row as all elements in the list.
2)如果我将其添加到列表中,如下所示,它将仅保留最后一行作为列表中的所有元素。
3) Create a list including the dictionary as elements. 3)创建一个包含字典作为元素的列表。
my_list.append(my_dict)
4) Subset the elements that I want to. 4)细分我想要的元素。
I couldn't write any code from here. 我无法从此处编写任何代码。 But What I want to do is subset elements meeting the condition: For example, choosing the element in the dictionary where the Sect is 2-2.
但是我要做的是满足条件的子集元素:例如,在Sect为2-2的字典中选择元素。 Then the wanted results could be as the follows:
然后,所需结果可能如下:
>> [{'Date': '2', 'Day': 'Mon', 'Sect': '2-2', '1': '234', '2': '585', '3': '282'}, {'Date': '3', 'Day': 'Tue', 'Sect': '2-2', '1': '231', '2':'232', '3':'686'}]
Thanks, 谢谢,
@supremed14 , you can also try the below code to prepare the list of dictionaries after reading the file. @ supremed14 ,您也可以在阅读文件后尝试以下代码来准备字典列表。
As white spaces are there in text file.
文本文件中有空格。 strip() method defined on strings will solve this problem.
在字符串上定义的strip()方法将解决此问题。
Date, Day, Sect, 1, 2, 3
1, Sun, 1-1, 123, 345, 678
2, Mon, 2-2, 234, 585, 282
3, Tue, 2-2, 231, 232, 686
Here you do not need to worry about closing the file.
在这里,您不必担心关闭文件。 It will be taken care by Python.
Python会注意的。
import json
my_list = [];
with open('data.txt') as f:
lines = f.readlines() # list containing lines of file
columns = [] # To store column names
i = 1
for line in lines:
line = line.strip() # remove leading/trailing white spaces
if line:
if i == 1:
columns = [item.strip() for item in line.split(',')]
i = i + 1
else:
d = {} # dictionary to store file data (each line)
data = [item.strip() for item in line.split(',')]
for index, elem in enumerate(data):
d[columns[index]] = data[index]
my_list.append(d) # append dictionary to list
# pretty printing list of dictionaries
print(json.dumps(my_list, indent=4))
[
{
"Date": "1",
"Day": "Sun",
"Sect": "1-1",
"1": "123",
"2": "345",
"3": "678"
},
{
"Date": "2",
"Day": "Mon",
"Sect": "2-2",
"1": "234",
"2": "585",
"3": "282"
},
{
"Date": "3",
"Day": "Tue",
"Sect": "2-2",
"1": "231",
"2": "232",
"3": "686"
}
]
Using pandas this is pretty easy: 使用熊猫很简单:
Input: 输入:
$cat test.txt
Date, Day, Sect, 1, 2, 3
1, Sun, 1-1, 123, 345, 678
2, Mon, 2-2, 234, 585, 282
3, Tue, 2-2, 231, 232, 686
Operations: 操作:
import pandas as pd
df = pd.read_csv('test.txt', skipinitialspace=True)
df.loc[df['Sect'] == '2-2'].to_dict(orient='records')
Output: 输出:
[{'1': 234, '2': 585, '3': 282, 'Date': 2, 'Day': 'Mon', 'Sect': '2-2'},
{'1': 231, '2': 232, '3': 686, 'Date': 3, 'Day': 'Tue', 'Sect': '2-2'}]
If your .txt file is in the CSV format: 如果您的.txt文件为CSV格式:
Date, Day, Sect, 1, 2, 3
1, Sun, 1-1, 123, 345, 678
2, Mon, 2-2, 234, 585, 282
3, Tue, 2-2, 231, 232, 686
You can use the csv
library: 您可以使用
csv
库:
from csv import reader
from pprint import pprint
result = []
with open('file.txt') as in_file:
# create a csv reader object
csv_reader = reader(in_file)
# extract headers
headers = [x.strip() for x in next(csv_reader)]
# go over each line
for line in csv_reader:
# if line is not empty
if line:
# create dict for line
d = dict(zip(headers, map(str.strip, line)))
# append dict if it matches your condition
if d['Sect'] == '2-2':
result.append(d)
pprint(result)
Which gives the following list: 给出以下列表:
[{'1': '234', '2': '585', '3': '282', 'Date': '2', 'Day': 'Mon', 'Sect': '2-2'},
{'1': '231', '2': '232', '3': '686', 'Date': '3', 'Day': 'Tue', 'Sect': '2-2'}]
I recommend you make the file a .csv (comma seperated value) file a parser for that file would look something like this 我建议您将文件设为.csv(逗号分隔值)文件,该文件的解析器应如下所示
def parseCsvFile (dataFile):
dict = {}
with open(dataFile) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
key = None
for k in row:
stripK = k.strip()
stripV = row[k].strip()
if key == None:
key = stripV
dict[key] = {}
dict[key][stripK] = stripV
return dict
This returns a dictionary of dictionaries 这将返回词典字典
If you are allowed to use pandas
, you can simply achieve your task by: 如果允许使用
pandas
,则可以通过以下方式简单地完成任务:
import pandas as pd
df = pd.read_csv('abc.txt', skipinitialspace=True) # reads your cvs file into a DataFrame
d = df.loc[df['Sect'] == '2-2'].to_dict('records') # filters the records which `Sect` value is '2-2', and returns a list of dictionaries
To install pandas
run: 要安装
pandas
运行:
python3 -m pip install pandas
Assumming, the contents of abc.txt
is the one you have provided, d
will be: 假设
abc.txt
的内容就是您提供的内容,则d
为:
[{'Date': 2, 'Day': 'Mon', 'Sect': '2-2', '1': 234, '2': 585, '3': 282},
{'Date': 3, 'Day': 'Tue', 'Sect': '2-2', '1': 231, '2': 232, '3': 686}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.