![](/img/trans.png)
[英]I am completely new to parquet files and python, Can anyone please let me know how to read parquet file with headers in pyspark
[英]I am new to python scripting.Please how can i filter and export the line to a new text file
请问我如何编写python代码以仅从文本文件中提取电话号码。 然后将提取保存在另一个文本文件中。
文本文件示例:
"Name": Farouk, "Age": 23, "Address": No. 582, Chile crescent, Kenya, "Phone number": 231765987
"Name": Ben, "Age": 23, "Address": No. 582, Chile crescent, Kenya, "Phone number": 21690860
假设实际上有一个换行符 "\\n" 在, "Phone number": 231765987 "Name": Ben,
喜欢:
"Name": Farouk, "Age": 23, "Address": No. 582, Chile crescent, Kenya, "Phone number": 231765987
"Name": Ben, "Age": 23, "Address": No. 582, Chile crescent, Kenya, "Phone number": 21690860
这可以解决问题:
with open("./data.txt") as read_file: # The file being read
with open("./Phone Numbers.txt", 'w') as write_file: # New file being created
for data in read_file:
for d in data.strip("\n").split(','):
if "Phone number" in d:
write_file.write(d[16:].strip(" ") + "\n")
作为初步建议,您应该在寻求帮助之前编写一些代码。 如果您共享一些代码,其他人将直接开始解决您的问题。
要解决这个问题,首先需要逐行读取文件。 如果每一行都包含您在问题中粘贴的文本,您可以使用正则表达式来搜索电话号码。 您也可以将此字符串转换为 JSON,但由于地址没有引号,因此格式无效。 所以最好使用正则表达式来解决这个问题。
查找示例代码来解决这个问题
import re
content = []
file_name = 'sample'
with open(file_name) as f:
content = f.readlines()
for line in content:
m = re.search("\"Phone number\": (\w+)", line)
if m is None:
print("There is no success for search.")
else:
print(m.groups()[0])
import glob
import errno
import csv
import re
i = 0
path = 'C:/Users/Mallam Farouk Sanusi/Desktop/k/*.txt'
files = glob.glob(path)
for name in files:
try:
with open(name) as f:
csv.field_size_limit(1310720)
s = csv.reader(f)
for line in s:
print (line[9])
i = i+1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.