[英]Splitting and appending a string in python
I have these strings that look like this:我有这些看起来像这样的字符串:
'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'
I want to take the state number 01 the county number 001 and the tract 020100 and make a new string 01001020100. How do I achieve this in Python?我想取 state 编号 01、县编号 001 和区域 020100 并创建一个新字符串 01001020100。如何在 Python 中实现这一点?
All of these strings are in a pandas dataframe so I need to apply this method across all of the rows.所有这些字符串都在 pandas dataframe 中,所以我需要将此方法应用于所有行。 There are all of type string as of I said above.
正如我上面所说的,有所有类型的字符串。
To provide more context here is all my code:在这里提供更多上下文是我的所有代码:
import pandas as pd
import numpy as np
import re
df = pd.read_csv('all_data.csv')
df = pd.read_csv('all_data.csv')
column_of_interest = df['Location+Type']
column_of_interest.head()
print(type(column_of_interest[0][0]))
<class 'str'>
find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]
column_of_interest['GEOID'] = column_of_interest.apply(lambda x: find_census(x['Location+Type']))
and I am getting this error for the lambda:我收到 lambda 的此错误:
1 find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]
----> 2 column_of_interest['GEOID'] = column_of_interest.apply(lambda x: find_census(x['Location+Type']))
TypeError: string indices must be integers
To achieve your goal, you can use a regular expression syntax.为了实现您的目标,您可以使用正则表达式语法。 But, It seems you are a beginner, so I come here with a basic logic based on
split
method.但是,看来您是初学者,所以我来这里是基于
split
方法的基本逻辑。 Here is the code:这是代码:
census = 'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'
state = census.split('state:')[1].split('>')[0]
county = census.split('county:')[1].split('>')[0]
tract = census.split('tract:')[1].split('>')[0]
result = state + county + tract
print(result) # 01001020100
Update: using lambda expression to generate the desired outputs
更新:使用lambda 表达式生成所需的输出
find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]
# to use the above lambda expression
print(find_census(census)) # 01001020100
Assuming your text follows the pattern you have given you can use regular expressions to get the result.假设您的文本遵循您提供的模式,您可以使用正则表达式来获取结果。
Here \d
corresponds to extracting a number \s
is a blank space这里
\d
对应提取一个数字\s
是一个空格
s = 'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'
import re
m=re.search("state:(\d+)>\scounty:(\d+)>\stract:(\d+)",s)
''.join(m.groups())
Output Output
'01001020100'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.