![](/img/trans.png)
[英]Pyspark how to count the number of occurences of a string in each group and print multiple selected columns?
[英]Python count occurences of a string and print the lines that contain them, as well as print number of occurences of string, with multiple clauses
我一直在嘗試創建一個 python 腳本,它接受兩個輸入,一個文件名和一個字符串。 一旦將這些輸入,它應該打印出輸入字符串的出現次數,以及包含輸入字符串的每一行。
我還被要求不使用列表、split 方法、python 字符串方法、關鍵字“in”,並且我可能只使用索引來訪問字符串的第一個字符,並使用切片來獲取字符串的尾部。
到目前為止我所做的:
def main():
search_for = raw_input("What term would you like to search for?")
text_file_name = raw_input("Which file would you like to search in?")
count_text_file(search_for, text_file_name)
def count_text_file(search_for, text_file_name):
usersFile = open(text_file_name, 'r')
usersTermLength = len(search_for)
usersFileLength = len(text_file_name)
occurenceOfString = 0
while i<usersFileLength:
firstChar = usersFile[i]
if firstChar==searchFor[0]:
indexUsersTermLength = usersTermLength + i #end slice
possibleMatch = usersFile[i:indexUsersTermLength]
if possibleMatch == searchFor:
print #the entire line
occurenceOfString+=1
i+=1
else:
i+=1
else:
i+=1
您的代碼中的一些問題。
usersFileLength = len(text_file_name)
這只是文件名的長度。 不是文件內容的大小。
firstChar = usersFile[i]
這不是您從文件中讀取的方式。 您需要使用read()
類的函數。
此外,您打破了一些(愚蠢的)約束。 這是我的解決方案。 它讀取整個文件,然后逐個字符地遍歷它。 它構建當前單詞,當它到達一個非字母時進行比較。
def count_text_file(search_for, text_file_name):
with open(text_file_name, 'r') as users_file:
# Read entire file
content = users_file.read()
line_number = 1
# Build the words of the file char-by-char
current_word = ""
while len(content) > 0:
# "only use indexing to access the first character of a string"
c = content[0]
# If it's a letter add to string
# Can't use c.isalpha() as it is a "python string method"
if (c >= 'A' and c <= 'Z') or (c >= 'a' and c <= 'z'):
current_word += c
# Else (not a letter), check the word
else:
if current_word == search_for:
print(f"found at line {line_number}")
if c == '\n':
line_number += 1
# Reset for next word
current_word = ""
# "only use ... slicing to get the tail of the string"
content = content[1:]
您可以進行一些改進。 例如,找不到標點符號的單詞(例如:“不能”或“不存在”)。 此外,它只將“字母”視為“[A-Za-z]”。 Unicode 字符將無法識別。 並且區分大小寫。 但既然這是一項作業,誰知道你的老師是否關心這些。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.