简体   繁体   中英

Find author name in multiple files and folders in python

import os

folderpath = 'D:/Workspace'
typeOfFile = [".c", ".C", ".cpp", ".CPP"]

for dirname, dirnames, filenames in os.walk(folderpath):
        for filename in filenames:
            if filename.endswith(tuple(typeOfFile)):
                for line in open(os.path.join(dirname, filename), "r").readlines():
                    left,author,right = line.partition('author')
                    if author:
                        name =(right[:100])
                        combine = name.replace(" ", "")
                        remove = combine.strip(':')
                        print remove

Help me how to use this else function. because this function keep looping print unknown when i want to use this..

                else: 
                    print 'unknown'

Because if the file don't have string author in it. It will skip the file and find another author. Sorry for my bad english. Thanks

We can use regular expression to extract name by building a pattern:

Usually, we find an author name after author keyword, as comment. Some people would prefer to use __author__ and write the name after : or :: or = or == . It depends on what you observe usually. it's encouraged to look on github for how people use author in their comment.

Name of author usually came after that, some people use nick names, so it doesn't alphabitic all the time.

pattern= r'.*author*.[^:=]*[:=]*(.[^\n]+)'

In a regular expression . means any charachter, and a + means one or more, ^ means except that charachter. You want to 'match' one or more charachters in the text except the new line (some people might use the space also as they write first/last name). The brackets mean to capture what is found into a word, this word should be found after a specific text "author" that accept any charchter before, and any charter after except '=/:', ':/=' used later to identify the other part.

In addition to what you do to open files, verify formats. Let's consider this quick example to illustrate the idea of how to use regular expression to extract author name.

#simple case
data1= """
author='helloWorld' 

def hello()
    print "hello world" 
    """
# case with two ::
data2= """
__author__::'someone'

def hello()
    print "hello world" 
    """
#case where we have numerical/alphabetic
data3= """
__author__='someone201119'

def hello()
    print "hello world" 
    """
#Case where no author in code
data4= """
def hello()
    print "hello world" 
    """


for data in [data1,data2,data3,data4]:
    m= re.match(r'.*author*.[^:=]*[:=]*(.[^\n]+)',data,re.DOTALL)
    if m: 
        author= m.group(1)
    else:
        author='unkown'
    print "author is this case is:", author

Output:

author is this case is: 'helloWorld'
author is this case is: 'someone'
author is this case is: 'someone201119'
author is this case is: unkown

UPDATE

Your over-all code would look like:

import os
import re

folderpath = 'D:/Workspace'
typeOfFile = [".c", ".C", ".cpp", ".CPP"]

for dirname, dirnames, filenames in os.walk(folderpath):
        for filename in filenames:
            if filename.endswith(tuple(typeOfFile)):
                data= open(os.path.join(dirname, filename), "r").readlines():
                m= re.match(r'.*author*.[^:=]*[:=]*(.[^\n]+)',data,re.DOTALL)
                if m: 
                    author= m.group(1)
                else:
                    author='unkown'
                print "author is this case is:", author, "in file", filename

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM