python pandas 删除字符

Question

I am working on a project, and I need to remove the left and right most character of a data result.我正在做一个项目，我需要删除数据结果的最左边和最右边的字符。 The data forms a scrape of craigslist, and the neighborhood results return as '(####)', but what I need it to be is ####.数据 forms 是 craigslist 的一部分，邻域结果返回为“（####）”，但我需要的是####。 I am using pandas, and trying to use lstrip & rstrip.我正在使用 pandas，并尝试使用 lstrip 和 rstrip。 When I attempt it inside the python shell, it works, but when I use it on my data it does not work.当我在 python shell 中尝试它时，它可以工作，但是当我在我的数据上使用它时它不起作用。

post_results['neighborhood'] = post_results['neighborhood'].str.lstrip('(')
post_results['neighborhood'] = post_results['neighborhood'].str.rstrip(')')

For some reason, the rstrip, does work and removes the ')' but the lstrip does not.出于某种原因，rstrip 确实有效并删除了“）”，但 lstrip 没有。

The full code is:完整的代码是：

from bs4 import BeautifulSoup
import json
from requests import get
import numpy as np
import pandas as pd
import csv


print('hello world')
#get the initial page for the listings, to get the total count
response = get('https://washingtondc.craigslist.org/search/hhh?query=rent&availabilityMode=0&sale_date=all+dates')
html_result = BeautifulSoup(response.text, 'html.parser')
results = html_result.find('div', class_='search-legend')
total = int(results.find('span',class_='totalcount').text)
pages = np.arange(0,total+1,120)

neighborhood = []
bedroom_count =[]
sqft = []
price = []
link = []

for page in pages:
    #print(page)

    response = get('https://washingtondc.craigslist.org/search/hhh?s='+str(page)+'query=rent&availabilityMode=0&sale_date=all+dates')
    html_result = BeautifulSoup(response.text, 'html.parser')

    posts = html_result.find_all('li', class_='result-row')
    for post in posts:
        if post.find('span',class_='result-hood') is not None:
            post_url = post.find('a',class_='result-title hdrlnk')
            post_link = post_url['href']
            link.append(post_link)
            post_neighborhood = post.find('span',class_='result-hood').text
            post_price = int(post.find('span',class_='result-price').text.strip().replace('$',''))
            neighborhood.append(post_neighborhood)
            price.append(post_price)
            if post.find('span',class_='housing') is not None:
                if 'ft2' in post.find('span',class_='housing').text.split()[0]:
                    post_bedroom = np.nan
                    post_footage = post.find('span',class_='housing').text.split()[0][:-3]
                    bedroom_count.append(post_bedroom)
                    sqft.append(post_footage)
                elif len(post.find('span',class_='housing').text.split())>2:
                    post_bedroom = post.find('span',class_='housing').text.replace("br","").split()[0]
                    post_footage = post.find('span',class_='housing').text.split()[2][:-3]
                    bedroom_count.append(post_bedroom)
                    sqft.append(post_footage)
                elif len(post.find('span',class_='housing').text.split())==2:
                    post_bedroom = post.find('span',class_='housing').text.replace("br","").split()[0]
                    post_footage = np.nan
                    bedroom_count.append(post_bedroom)
                    sqft.append(post_footage)
            else:
                post_bedroom = np.nan
                post_footage = np.nan
                bedroom_count.append(post_bedroom)
                sqft.append(post_footage)



#create results data frame
post_results = pd.DataFrame({'neighborhood':neighborhood,'footage':sqft,'bedroom':bedroom_count,'price':price,'link':link})
#clean up results
post_results.drop_duplicates(subset='link')
post_results['footage'] = post_results['footage'].replace(0,np.nan)
post_results['bedroom'] = post_results['bedroom'].replace(0,np.nan)
post_results['neighborhood'] = post_results['neighborhood'].str.lstrip('(')
post_results['neighborhood'] = post_results['neighborhood'].str.rstrip(')')
post_results = post_results.dropna(subset=['footage','bedroom'],how='all')
post_results.to_csv("rent_clean.csv",index=False)
print(len(post_results.index))

Answer 1

This problem will happened when you have whitespace in the front当您在前面有空格时会发生此问题

For example:例如：

s=pd.Series([' (xxxx)','(yyyy) '])
s.str.strip('(|)')
0     (xxxx
1    yyyy) 
dtype: object

What we can do is strip twice我们能做的就是strip两次

s.str.strip().str.strip('(|)')
0    xxxx
1    yyyy
dtype: object

Answer 2

From my understanding of your question, you are removing characters from a string.根据我对您问题的理解，您正在从字符串中删除字符。 You don't need pandas for this.为此，您不需要 pandas。 Strings have a length and you can remove the first and last character like this;字符串有长度，你可以像这样删除第一个和最后一个字符；

new_word = old_word[1:-1]

This should work for you.这应该适合你。 Good luck.祝你好运。

python pandas 删除字符

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-06-14 00:31:30

解决方案2
1 2020-06-14 00:31:38

python pandas 删除字符

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-06-14 00:31:30

解决方案2 1 2020-06-14 00:31:38

解决方案1
2 已采纳 2020-06-14 00:31:30

解决方案2
1 2020-06-14 00:31:38