如何从我的代码根据包含特定字符串选择的文本文件的行打印元素？

Question

我是 python 的新手，我想找出一个问题。 所以我有 2 个文本文件； 第一个包含一个词（为了简单起见，我从一个词开始），我读了那个词，将它分配给一个字符串变量，然后在我的第二个文本文件中的数万行中查找这个词。 这部分我已经完成了。 现在谈谈我的问题。

第二个文本文件包含 4 列，为了简单起见，我将在下面举一个例子：

Alpha 100 200 thewordiamlookingforisapple
Beta 200 300 thewordiamnotlookingforispear
Gamma 300 400 onceagainapple
Theta 400 500 onceagainapple
Omega 500 600 andonceagainpear

假设我正在寻找字符串“apple”并且第 1,3 和 4 行包含它。 现在我想打印相关行的第一、第二和第三列。

到目前为止，我的代码是这样的：

def word_match(File, String):
    wordnumber = 0
    listOfAssociatedWords = []
    with open(File, 'r') as read_obj:
        for line in read_obj:
            wordnumber += 1
            if String in line:
                listOfAssociatedWords.append((wordnumber, line.rstrip()))

    return listOfAssociatedWords
#------------------------------------------------------------------------------
firstfile = open("/Directory/firstfilename", "r")
String = firstfile.read()

firstfile.close()
#------------------------------------------------------------------------------
matched_words = word_match("/Directory/secondfilename", word)
print('Total Matched Words : ', len(matched_words))
for elem in matched_words:
    print('Word Number = ', elem[0], ' :: Line = ', elem[1])

Current Output:
('Total Matched Words : ', 3)
('Word Number = ', 1, ' :: Line = ', 'Alpha 100 200 thewordiamlookingforisapple')
('Word Number = ', 3, ' :: Line = ', 'Gamma 300 400 onceagainapple')
('Word Number = ', 4, ' :: Line = ', 'Theta 400 500 onceagainapple')


Desired Output:
Alpha 100 200
Gamma 300 400
Theta 400 500

Answer 1

我想你想要这个

def word_match(File, String):
    wordnumber = 0
    listOfAssociatedWords = []
    with open(File, 'r') as read_obj:
        for line in read_obj:
            wordnumber += 1
            if String in line:
                listOfAssociatedWords.append(line.split()[:3])

    return listOfAssociatedWords

Answer 2

另一种简单的方法是使用熊猫。 您可以将文件读入熊猫数据帧。 这样以后如果您想为逻辑添加更多复杂性，那将相当容易。 使用大熊猫可以达到预期的结果：

import pandas as pd

initial_word = 'apple'

sample_dict = {'col1': ['Alpha', 'Beta', 'Gamma', 'Theta', 'Omega'], 'col2': [100, 200, 300, 400, 500],
                'col3': [200, 300, 400, 500, 600],
                'col4': ['thewordiamlookingforisapple', 'thewordiamnotlookingforispear', 'onceagainapple', 'onceagainapple', 'andonceagainpear']}
df = pd.DataFrame(data=sample_dict)

print(df)
new_df = df[df['col4'].str.contains(initial_word)]
new_df = new_df.drop('col4', 1)
print(new_df)

输出看起来像（对于 df）：

 col1  col2  col3                           col4
0  Alpha   100   200    thewordiamlookingforisapple
1   Beta   200   300  thewordiamnotlookingforispear
2  Gamma   300   400                 onceagainapple
3  Theta   400   500                 onceagainapple
4  Omega   500   600               andonceagainpear

而对于新的 df：

    col1  col2  col3
0  Alpha   100   200
2  Gamma   300   400
3  Theta   400   500

您可以首先读取 txt 文件并转换为 Pandas 数据帧。

如何从我的代码根据包含特定字符串选择的文本文件的行打印元素？

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-07-26 04:48:07

解决方案2
0 2021-07-26 05:01:33

如何从我的代码根据包含特定字符串选择的文本文件的行打印元素？

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-07-26 04:48:07

解决方案2 0 2021-07-26 05:01:33

解决方案1
0 已采纳 2021-07-26 04:48:07

解决方案2
0 2021-07-26 05:01:33