I have these strings that look like this:
'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'
I want to take the state number 01 the county number 001 and the tract 020100 and make a new string 01001020100. How do I achieve this in Python?
All of these strings are in a pandas dataframe so I need to apply this method across all of the rows. There are all of type string as of I said above.
To provide more context here is all my code:
import pandas as pd
import numpy as np
import re
df = pd.read_csv('all_data.csv')
df = pd.read_csv('all_data.csv')
column_of_interest = df['Location+Type']
column_of_interest.head()
print(type(column_of_interest[0][0]))
<class 'str'>
find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]
column_of_interest['GEOID'] = column_of_interest.apply(lambda x: find_census(x['Location+Type']))
and I am getting this error for the lambda:
1 find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]
----> 2 column_of_interest['GEOID'] = column_of_interest.apply(lambda x: find_census(x['Location+Type']))
TypeError: string indices must be integers
To achieve your goal, you can use a regular expression syntax. But, It seems you are a beginner, so I come here with a basic logic based on split
method. Here is the code:
census = 'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'
state = census.split('state:')[1].split('>')[0]
county = census.split('county:')[1].split('>')[0]
tract = census.split('tract:')[1].split('>')[0]
result = state + county + tract
print(result) # 01001020100
Update: using lambda expression to generate the desired outputs
find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]
# to use the above lambda expression
print(find_census(census)) # 01001020100
Assuming your text follows the pattern you have given you can use regular expressions to get the result.
Here \d
corresponds to extracting a number \s
is a blank space
s = 'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'
import re
m=re.search("state:(\d+)>\scounty:(\d+)>\stract:(\d+)",s)
''.join(m.groups())
Output
'01001020100'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.