简体   繁体   中英

How to split a pandas dataframe based on regex string

I have a CSV of questions and results. Built a simple bit of code to turn into aa list of dataframes for analysis.

But last one refused to split out, I think because simple startswith and endswith couldn't handle the fact that the startswith on every question startswith "<Q"

def start_and_finish_points(df):
    df_indices_start = []
    df_indices_end = []
    rows = df.iloc[:, 0].to_list()
    for i, row in enumerate(rows):
        if str(row).startswith('<Q'):
            df_indices_start.append(i)
        if str(row).endswith('++'):
            df_indices_end.append(i)    
    return df_indices_start, df_indices_end
start, finish = start_and_finish_points(df) 

The problem one is code can't handle " <Q"

    question
    698 <Q8> To what extent are you concerned about of the following.................Climate change
    ... Some data
    700  <Q11e> How often d...

Can I generalise the startswith to cope with a space at the start of the string? I'm sure it is regex but I can't see it. UPDATE:

The dataframe column that I want to extract from is this:

698    <Q8> To what extent are you concerned about of the following.................Climate change
699                                                                                               
700                                                                                All respondents
704                                                                                 Unweighted row
705                                                                          Effective sample size
706                                                                                          Total
707                                                                        1: Not at all concerned
710                                                                                              2
713                                                                                              3
716                                                                                              4
719                                                                              5: Very concerned
722                                                                                 Not applicable
725                                                                               Total: Concerned
728                                                                           Total: Not Concerned
731                                                                                Net % concerned
733                                                      95% lower case or +, 99% UPPER CASE or ++
735             <Q11e> How often do you access local greenspaces  (e.g. parks, community gardens)?
736                                                                                               
737                                                                                All respondents
741                                                                                 Unweighted row
742                                                                          Effective sample size
743                                                                                          Total
744                                                                           Hardly ever or never
747                                                                               Some of the time
750                                                                                          Often
753                                                                            (Prefer not to say)
Name: nan, dtype: object

[34]

One way of avoiding regex here is by using the built-in strip method, to remove any space before and after your string.

if str(row).strip().startswith('<Q'):

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM