Python熊猫读取文件，写入Excel

Question

I have a file like this : 我有一个像这样的文件：

SOME_INFO_BEGIN
....
....
SOME_INFO_END
ACTUAL_DETAIL_BEGIN
TEST|1|23|abcd|
TEST|2|5|efgs|
TEST|3|124|zyz|       
ACTUAL_DETAIL_END

I only to read the lines between ACTUAL_DETAILS_BEGIN and ACTUAL_DETAILS_END and they will always start with TEST, however i also only to read the line which has 5 in the 3rd column. 我只阅读ACTUAL_DETAILS_BEGIN和ACTUAL_DETAILS_END之间的行，它们始终以TEST开头，但是我也只阅读第3列中包含5的行。

The below code works for me except it gets all the 3 lines - 下面的代码对我有用，除了它获得了所有3行-

with open(dir+filename, 'r') as filehandle:  
    filecontent = filehandle.readlines()
ifa = [k for k in filecontent if 'TEST' in k]
df = pd.DataFrame([sub.split("|") for sub in ifa])
df.columns= ['Type','Amt','Desc','Value1','Value2']
df1 = df[['Type','Desc']]
print(df1)
df1.to_excel (dir+"test.xlsx", index = False)

Q1. Q1。 Is there a better way to code this ? 有没有更好的方法编写此代码？ For eg. 例如。 how is the filehandle closed for excel write? 如何关闭excel写的文件句柄？

Q2. Q2。 How do i only pick up the 2nd row ? 我如何只接第二排？

Answer 1

Nothing to test with but you can iterate files and lazily load the lines. 没什么可测试的，但是您可以迭代文件并延迟加载行。 Perhaps this is more efficient: 也许这样更有效：

rebuilt = []

with open(dir+filename, 'r') as infile:
    for row in infile:
        if row[:4] == 'TEST':
            rebuild.append(row.split('|'))

df = pd.DataFrame(rebuilt, columns= ['Type','Amt','Desc','Value1','Value2'])

readlines() is going to load the whole thing into memory, regardless, so you can filter the lines as you read them instead. 无论如何， readlines()整个内容加载到内存中，因此您可以在读取行时对其进行过滤。 You're also splitting each line before checking a condition, so it may be faster to check against a list slice. 您还需要在检查条件之前对每一行进行拆分，因此根据列表切片进行检查可能会更快。

Python熊猫读取文件，写入Excel

问题描述

1 个解决方案

解决方案1
1 2019-06-22 13:03:41

Python熊猫读取文件，写入Excel

问题描述

1 个解决方案

解决方案1 1 2019-06-22 13:03:41

解决方案1
1 2019-06-22 13:03:41