I have a CSV of questions and results. Built a simple bit of code to turn into aa list of dataframes for analysis.
But last one refused to split out, I think because simple startswith and endswith couldn't handle the fact that the startswith on every question startswith "<Q"
def start_and_finish_points(df):
df_indices_start = []
df_indices_end = []
rows = df.iloc[:, 0].to_list()
for i, row in enumerate(rows):
if str(row).startswith('<Q'):
df_indices_start.append(i)
if str(row).endswith('++'):
df_indices_end.append(i)
return df_indices_start, df_indices_end
start, finish = start_and_finish_points(df)
The problem one is code can't handle " <Q"
question
698 <Q8> To what extent are you concerned about of the following.................Climate change
... Some data
700 <Q11e> How often d...
Can I generalise the startswith to cope with a space at the start of the string? I'm sure it is regex but I can't see it. UPDATE:
The dataframe column that I want to extract from is this:
698 <Q8> To what extent are you concerned about of the following.................Climate change
699
700 All respondents
704 Unweighted row
705 Effective sample size
706 Total
707 1: Not at all concerned
710 2
713 3
716 4
719 5: Very concerned
722 Not applicable
725 Total: Concerned
728 Total: Not Concerned
731 Net % concerned
733 95% lower case or +, 99% UPPER CASE or ++
735 <Q11e> How often do you access local greenspaces (e.g. parks, community gardens)?
736
737 All respondents
741 Unweighted row
742 Effective sample size
743 Total
744 Hardly ever or never
747 Some of the time
750 Often
753 (Prefer not to say)
Name: nan, dtype: object
[34]
One way of avoiding regex here is by using the built-in strip
method, to remove any space before and after your string.
if str(row).strip().startswith('<Q'):
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.