简体   繁体   中英

Splitting and appending a string in python

I have these strings that look like this:

'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'

I want to take the state number 01 the county number 001 and the tract 020100 and make a new string 01001020100. How do I achieve this in Python?

All of these strings are in a pandas dataframe so I need to apply this method across all of the rows. There are all of type string as of I said above.

To provide more context here is all my code:

import pandas as pd
import numpy as np
import re

df = pd.read_csv('all_data.csv')

df = pd.read_csv('all_data.csv')

column_of_interest = df['Location+Type']

column_of_interest.head()

print(type(column_of_interest[0][0]))

<class 'str'>

find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]
column_of_interest['GEOID'] = column_of_interest.apply(lambda x: find_census(x['Location+Type']))

and I am getting this error for the lambda:

     1 find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]
----> 2 column_of_interest['GEOID'] = column_of_interest.apply(lambda x: find_census(x['Location+Type']))

TypeError: string indices must be integers

To achieve your goal, you can use a regular expression syntax. But, It seems you are a beginner, so I come here with a basic logic based on split method. Here is the code:

census = 'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'

state = census.split('state:')[1].split('>')[0]
county = census.split('county:')[1].split('>')[0]
tract = census.split('tract:')[1].split('>')[0]
result = state + county + tract

print(result) # 01001020100

Update: using lambda expression to generate the desired outputs

find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]

# to use the above lambda expression
print(find_census(census)) # 01001020100

Assuming your text follows the pattern you have given you can use regular expressions to get the result.

Here \d corresponds to extracting a number \s is a blank space

s = 'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'
import re
m=re.search("state:(\d+)>\scounty:(\d+)>\stract:(\d+)",s)
''.join(m.groups())

Output

'01001020100'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM