简体   繁体   中英

Extracting specific text from txt file in python

I've recently picked up python to do some text extracting. I have a data set that looks like this:

    @article{noauthor_collective_nodate,
    title = {Collective teacher efficacy},
    abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},
@article{noauthor_collective_nodate,
    title = {Collective teacher efficacy},
    abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},
}

@article{noauthor_initial_nodate,
    title = {Initial teacher education programs},
    abstract = {Overview Influence: Initial teacher education programs Domain: Teacher Sub-Domain: Teacher Education Potential to Accelerate Student Achievement: Likely to have small positive impact Influence Definition: Initial teacher education or {ITEs} (sometimes at the undergraduate level and sometimes at the post-graduate level) is the entry-level qualification for teaching in numerous countries, including the United States. More recently, there are school-based {ITEs}, non-accredited {ITEs}, and many online {ITE} programs. Evidence Number of meta-analyses: 5 Number of studies: 117 Number of students: 106,016 Number of effects: 509 Effect size: 0.10},
}

@article{noauthor_professional_nodate,
    title = {Professional development programs},
    abstract = {Overview Influence: Professional development programs Domain: Teacher Sub-Domain: Teacher Education Potential to Accelerate Student Achievement: Likely to have positive impact Influence Definition: Professional development relates to courses or interventions aimed to enhance the beliefs, actions, impact of knowledge of teachers and school leaders. Evidence Number of meta-analyses: 21 Number of studies: 1,151 Number of students: 2,321,242 Number of effects: 2,938 Effect size: 0.37},
    keywords = {Program Development},
}

And I want to extract the title and part of the abstract from this text. I managed to extract my desired output by using this code:

s = "@article{noauthor_collective_nodate, title = {Collective teacher efficacy}, abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},}@article{noauthor_collective_nodate, title = {Collective teacher efficacy}, abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},}"


start = s.find("title = {") + len("title = {")
end = s.find("}, abstract")

start2 = s.find("Influence Definition: ") + len("Influence Definition: ")
end2 = s.find("Evidence Number of meta-analyses:")

substring = s[start:end]
substring2 = s[start2:end2]
print(substring+' - '+substring2+";")

Output:

Collective teacher efficacy - The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. ;

The problem is:

  • That this only takes out the first search result
  • I want to be able to run it on the original text file instead of copy it in as "s".

Can someone please lend out a helping hand?

  1. str.find has a start parameter. You can use that to skip past your previous search result and only find the next occurrence.
  2. You can use open to read the text from a file (pay attention to the example code in the documentation, ie use with open("filename")... )

This should do it:

with open("myfile.txt", "r") as f:
    s = f.readlines()
    for x in s:
        if x.__contains__("title"):
            start = x.find("title = {") + len("title = {")
            end = x.find("}")
            substring = x[start:end] + " - "
        if x.__contains__("Influence Definition"):
            start = x.find("Influence Definition: ") + len("Influence Definition: ")
            end = x.find("Evidence Number of meta-analyses:")
            substring += x[start:end]
            print(substring)
            print()
    f.close()

For example, if your file is called myfile.txt, this will print the following:

Collective teacher efficacy - The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes.

Collective teacher efficacy - The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes.

Initial teacher education programs - Initial teacher education or {ITEs} (sometimes at the undergraduate level and sometimes at the post-graduate level) is the entry-level qualification for teaching in numerous countries, including the United States. More recently, there are school-based {ITEs}, non-accredited {ITEs}, and many online {ITE} programs.

Professional development programs - Professional development relates to courses or interventions aimed to enhance the beliefs, actions, impact of knowledge of teachers and school leaders.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM