If I have a text file containing:
Proto Local Address Foreign Address State PID
TCP 0.0.0.0:11 0.0.0.0:0 LISTENING 12 dns.exe
TCP 0.0.0.0:95 0.0.0.0:0 LISTENING 589 lsass.exe
TCP 0.0.0.0:111 0.0.0.0:0 LISTENING 888 svchost.exe
TCP 0.0.0.0:123 0.0.0.0:0 LISTENING 123 lsass.exe
TCP 0.0.0.0:449 0.0.0.0:0 LISTENING 2 System
Is there a way to extract ONLY the process ID names such as dns.exe, lsass.exe, etc..?
I tried using split()
so I could get the info right after the string LISTENING
. Then I took whats left ( 12 dns.exe, 589 lsass.exe,
etc... ), and checked the length of each string. So if the len()
of 12 dns.exe
was between 17 or 20 for example, I would get the substring of that string with specific numbers. I only took into account the length of the PID numbers(which can be anywhere between 1 to 4 digits) but then forgot that the length of each process name varies (there are hundreds). Is there a simpler way to do this or am I out of luck?
You can use pandas
DataFrames to do this without getting into the hassle of split
:
parsed_file = pandas.read_csv("filename", header = 0)
will automatically read this into a DataFrame for you. You can then filter by those rows containing dns.exe
, etc. You may need to define your own header
Here is a more general replacement for read_csv
if you want more control. I've assumed your columns are all tab separated, but you can feel free to change the splitting character however you like:
with open('filename','r') as logs:
logs.readline() # skip header so you can can define your own.
columns = ["Proto","Local Address","Foreign Address","State","PID", "Process"]
formatted_logs = pd.DataFrame([dict(zip(columns,line.split('\t'))) for line in logs])
Then you can just filter the rows by
formatted_logs = formatted_logs[formatted_logs['Process'].isin(['dns.exe','lsass.exe', ...])]
If you want just the process names, it is even simpler. Just do
processes = formatted_logs['Process'] # returns a Series object than can be iterated through
split
should work just fine so long you ignore the header in your file
processes = []
with open("file.txt", "r") as f:
lines = f.readlines()
# Loop through all lines, ignoring header.
# Add last element to list (i.e. the process name)
for l in lines[1:]:
processes.append(l.split()[-1])
print processes
Result:
['dns.exe', 'lsass.exe', 'svchost.exe', 'lsass.exe', 'System']
You could simply use re.split
:
import re
rx = re.compile(" +")
l = rx.split(" 12 dns.exe") # => ['', '12', 'dns.exe']
pid = l[1]
it will split the string on a arbitrary number of spaces, and you take second element.
You could also use simply split and treat the line step by step, one by one like this:
def getAllExecutables(textFile):
execFiles = []
with open(textFile) as f:
fln = f.readline()
while fln:
pidname = str.strip(list(filter(None, fln.split(' ')))[-1]) #splitting the line, removing empty entry, stripping unnecessary chars, take last element
if (pidname[-3:] == 'exe'): #check if the pidname ends with exe
execFiles.append(pidname) #if it does, adds it
fln = f.readline() #read the next line
return execFiles
exeFiles = getAllExecutables('file.txt')
print(exeFiles)
Some remarks on the code above:
filter
\\n
) by str.strip
l[-1]
exe
. If it is, adds it to the resulting list. Results:
['dns.exe', 'lsass.exe', 'svchost.exe', 'lsass.exe']
with open(txtfile) as txt:
lines = [line for line in txt]
process_names = [line.split()[-1] for line in lines[1:]]
This opens your input file and reads all the lines into a list. Next, the list is iterated over starting at the second element (because the first is the header row) and each line is split()
. The last item in the resulting list is then added to process_names
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.