简体   繁体   English

在 python 中拆分和附加字符串

[英]Splitting and appending a string in python

I have these strings that look like this:我有这些看起来像这样的字符串:

'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'

I want to take the state number 01 the county number 001 and the tract 020100 and make a new string 01001020100. How do I achieve this in Python?我想取 state 编号 01、县编号 001 和区域 020100 并创建一个新字符串 01001020100。如何在 Python 中实现这一点?

All of these strings are in a pandas dataframe so I need to apply this method across all of the rows.所有这些字符串都在 pandas dataframe 中,所以我需要将此方法应用于所有行。 There are all of type string as of I said above.正如我上面所说的,有所有类型的字符串。

To provide more context here is all my code:在这里提供更多上下文是我的所有代码:

import pandas as pd
import numpy as np
import re

df = pd.read_csv('all_data.csv')

df = pd.read_csv('all_data.csv')

column_of_interest = df['Location+Type']

column_of_interest.head()

print(type(column_of_interest[0][0]))

<class 'str'>

find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]
column_of_interest['GEOID'] = column_of_interest.apply(lambda x: find_census(x['Location+Type']))

and I am getting this error for the lambda:我收到 lambda 的此错误:

     1 find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]
----> 2 column_of_interest['GEOID'] = column_of_interest.apply(lambda x: find_census(x['Location+Type']))

TypeError: string indices must be integers

To achieve your goal, you can use a regular expression syntax.为了实现您的目标,您可以使用正则表达式语法。 But, It seems you are a beginner, so I come here with a basic logic based on split method.但是,看来您是初学者,所以我来这里是基于split方法的基本逻辑。 Here is the code:这是代码:

census = 'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'

state = census.split('state:')[1].split('>')[0]
county = census.split('county:')[1].split('>')[0]
tract = census.split('tract:')[1].split('>')[0]
result = state + county + tract

print(result) # 01001020100

Update: using lambda expression to generate the desired outputs更新:使用lambda 表达式生成所需的输出

find_census = lambda text: text.split('state:')[1].split('>')[0] + text.split('county:')[1].split('>')[0] + text.split('tract:')[1].split('>')[0]

# to use the above lambda expression
print(find_census(census)) # 01001020100

Assuming your text follows the pattern you have given you can use regular expressions to get the result.假设您的文本遵循您提供的模式,您可以使用正则表达式来获取结果。

Here \d corresponds to extracting a number \s is a blank space这里\d对应提取一个数字\s是一个空格

s = 'Census Tract 201, Autauga County, Alabama: Summary level: 140, state:01> county:001> tract:020100'
import re
m=re.search("state:(\d+)>\scounty:(\d+)>\stract:(\d+)",s)
''.join(m.groups())

Output Output

'01001020100'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM