简体   繁体   中英

Read in a .txt file as desired dataframe format

I have a txt file that looks like this:

    Alabama[edit]
    Auburn (Auburn University, Edward Via College of Osteopathic Medicine)
    Birmingham (University of Alabama at Birmingham, Birmingham School of 
    Alaska[edit]
    Anchorage[21] (University of Alaska Anchorage)
    Fairbanks (University of Alaska Fairbanks)[16]

I want to readin the txt file as a data frame that looks like this:

state     county
Alabama   Auburn
Alabama   Birmingham
Alaska    Anchorage
Alaska    Faibanks

What I have so far is:

university_towns = open('university_towns.txt','r')
df_university_towns = pd.DataFrame(columns={'State','RegionName'})
# loop over each line of the file object
# determine if each line is state or county. 
# if the line has [edit], it's state
for line in university_towns:
    state_pattern = re.compile('\[edit\]')
    state_pattern_m = state_pattern.search(line)
    county_pattern = re.compile('(')
    county_pattern_m = county_pattern.search(line)
    if state_pattern_m:
        #extract everything before \[edit]
        print(state_pattern_m.start())
        end_position = state_pattern_m.start()
        print(line[0:end_position])
        state_name = line[0:end_position]
    if county_pattern_m:
        #extract everything before (

This code will only give me something like this:

State  County
Alabama Auburn
        Birminham
.
.
.

This should do it:

key = None

for line in t:
    if '[edit]' in line:
        key = line.replace('[edit]', '')
        continue
    if key:
        # Use regex to extrac what you need
        print(key, line.split(' ')[0])

I'm not sure what your data looks like so change the regex to remove [] from the title(guessing it's a title) and possibly use regex in place of '[edit'] in

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM