[英]Remove all punctuation from string
我目前正在研究 pandas dataframe 並嘗試從由列表中的字符串組成的列中提取值,但我有點堅持如何只保留我想要的文本。
這是列表之一的樣子:
["{'BusinessAcceptsCreditCards': 'True'",
"'RestaurantsPriceRange2': '2'",
"'ByAppointmentOnly': 'False'",
"'BikeParking': 'False'",
'\'BusinessParking\': "{\'garage\': False',
"'street': True",
"'validated': False",
"'lot': False",
'\'valet\': False}"}']
冒號左邊是屬性,冒號右邊是對應的值。 有沒有辦法讓我在這個列表上 go 並擺脫每個字符串中的所有標點符號並僅獲取屬性和相應值的文本?
所以我的想法是首先通過使用以下代碼來打破冒號:
txt = df_business['attributes'][2]
y = txt.split(", ")
y
y1 = y[0].split(":")
y1
y1[1].strip()
但是使用上面的代碼,我只能得到以下結果:
Attribute = "{'BusinessAcceptsCreditCards'"
Value = "'True'"
我想要的結果是:
Attribute = "BusinessAcceptsCreditCards"
Value = "True"
dataframe 示例:
{'business_id': {0: '6iYb2HFDywm3zjuRg0shjw',
1: 'tCbdrRPZA0oiIYSmHG3J0w',
2: 'bvN78flM8NLprQ1a1y5dRg',
3: 'oaepsyvc0J17qwi8cfrOWg',
4: 'PE9uqAjdw0E4-8mjGl3wVA',
5: 'D4JtQNTI4X3KcbzacDJsMw',
6: 't35jsh9YnMtttm69UCp7gw',
7: 'jFYIsSb7r1QeESVUnXPHBw',
8: 'N3_Gs3DnX4k9SgpwJxdEfw'},
'name': {0: 'Oskar Blues Taproom',
1: 'Flying Elephants at PDX',
2: 'The Reclaimory',
3: 'Great Clips',
4: 'Crossfit Terminus',
5: 'Bob Likes Thai Food',
6: 'Escott Orthodontics',
7: 'Boxwood Biscuit',
8: 'Lane Wells Jewelry Repair'},
'address': {0: '921 Pearl St',
1: '7000 NE Airport Way',
2: '4720 Hawthorne Ave',
3: '2566 Enterprise Rd',
4: '1046 Memorial Dr SE',
5: '3755 Main St',
6: '2511 Edgewater Dr',
7: '740 S High St',
8: '7801 N Lamar Blvd, Ste A140'},
'city': {0: 'Boulder',
1: 'Portland',
2: 'Portland',
3: 'Orange City',
4: 'Atlanta',
5: 'Vancouver',
6: 'Orlando',
7: 'Columbus',
8: 'Austin'},
'state': {0: 'CO',
1: 'OR',
2: 'OR',
3: 'FL',
4: 'GA',
5: 'BC',
6: 'FL',
7: 'OH',
8: 'TX'},
'postal_code': {0: '80302',
1: '97218',
2: '97214',
3: '32763',
4: '30316',
5: 'V5V',
6: '32804',
7: '43206',
8: '78752'},
'latitude': {0: 40.0175444,
1: 45.5889058992,
2: 45.5119069956,
3: 28.9144823,
4: 33.7470274,
5: 49.2513423,
6: 28.573998,
7: 39.947006523,
8: 30.346169},
'longitude': {0: -105.2833481,
1: -122.5933307507,
2: -122.6136928797,
3: -81.2959787,
4: -84.3534244,
5: -123.101333,
6: -81.3892841,
7: -82.997471,
8: -97.711458},
'stars': {0: 4.0,
1: 4.0,
2: 4.5,
3: 3.0,
4: 4.0,
5: 3.5,
6: 4.5,
7: 4.5,
8: 5.0},
'review_count': {0: 86,
1: 126,
2: 13,
3: 8,
4: 14,
5: 169,
6: 7,
7: 11,
8: 30},
'is_open': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1},
'attributes': {0: '{\'RestaurantsTableService\': \'True\', \'WiFi\': "u\'free\'", \'BikeParking\': \'True\', \'BusinessParking\': "{\'garage\': False, \'street\': True, \'validated\': False, \'lot\': False, \'valet\': False}", \'BusinessAcceptsCreditCards\': \'True\', \'RestaurantsReservations\': \'False\', \'WheelchairAccessible\': \'True\', \'Caters\': \'True\', \'OutdoorSeating\': \'True\', \'RestaurantsGoodForGroups\': \'True\', \'HappyHour\': \'True\', \'BusinessAcceptsBitcoin\': \'False\', \'RestaurantsPriceRange2\': \'2\', \'Ambience\': "{\'touristy\': False, \'hipster\': False, \'romantic\': False, \'divey\': False, \'intimate\': False, \'trendy\': False, \'upscale\': False, \'classy\': False, \'casual\': True}", \'HasTV\': \'True\', \'Alcohol\': "\'beer_and_wine\'", \'GoodForMeal\': "{\'dessert\': False, \'latenight\': False, \'lunch\': False, \'dinner\': False, \'brunch\': False, \'breakfast\': False}", \'DogsAllowed\': \'False\', \'RestaurantsTakeOut\': \'True\', \'NoiseLevel\': "u\'average\'", \'RestaurantsAttire\': "\'casual\'", \'RestaurantsDelivery\': \'None\'}',
1: '{\'RestaurantsTakeOut\': \'True\', \'RestaurantsAttire\': "u\'casual\'", \'GoodForKids\': \'True\', \'BikeParking\': \'False\', \'OutdoorSeating\': \'False\', \'Ambience\': "{\'romantic\': False, \'intimate\': False, \'touristy\': False, \'hipster\': False, \'divey\': False, \'classy\': False, \'trendy\': False, \'upscale\': False, \'casual\': True}", \'Caters\': \'True\', \'RestaurantsReservations\': \'False\', \'RestaurantsDelivery\': \'False\', \'HasTV\': \'False\', \'RestaurantsGoodForGroups\': \'False\', \'BusinessAcceptsCreditCards\': \'True\', \'NoiseLevel\': "u\'average\'", \'ByAppointmentOnly\': \'False\', \'RestaurantsPriceRange2\': \'2\', \'WiFi\': "u\'free\'", \'BusinessParking\': "{\'garage\': True, \'street\': False, \'validated\': False, \'lot\': False, \'valet\': False}", \'Alcohol\': "u\'beer_and_wine\'", \'GoodForMeal\': "{\'dessert\': False, \'latenight\': False, \'lunch\': True, \'dinner\': False, \'brunch\': False, \'breakfast\': True}"}',
2: '{\'BusinessAcceptsCreditCards\': \'True\', \'RestaurantsPriceRange2\': \'2\', \'ByAppointmentOnly\': \'False\', \'BikeParking\': \'False\', \'BusinessParking\': "{\'garage\': False, \'street\': True, \'validated\': False, \'lot\': False, \'valet\': False}"}',
3: "{'RestaurantsPriceRange2': '1', 'BusinessAcceptsCreditCards': 'True', 'GoodForKids': 'True', 'ByAppointmentOnly': 'False'}",
4: '{\'GoodForKids\': \'False\', \'BusinessParking\': "{\'garage\': False, \'street\': False, \'validated\': False, \'lot\': False, \'valet\': False}", \'BusinessAcceptsCreditCards\': \'True\'}',
5: '{\'GoodForKids\': \'True\', \'Alcohol\': "u\'none\'", \'RestaurantsGoodForGroups\': \'True\', \'RestaurantsReservations\': \'True\', \'BusinessParking\': "{\'garage\': False, \'street\': True, \'validated\': False, \'lot\': False, \'valet\': False}", \'RestaurantsAttire\': "u\'casual\'", \'BikeParking\': \'True\', \'RestaurantsPriceRange2\': \'2\', \'HasTV\': \'False\', \'NoiseLevel\': "u\'average\'", \'WiFi\': "u\'no\'", \'RestaurantsTakeOut\': \'True\', \'Caters\': \'False\', \'OutdoorSeating\': \'False\', \'Ambience\': "{\'romantic\': False, \'intimate\': False, \'classy\': False, \'hipster\': False, \'divey\': False, \'touristy\': False, \'trendy\': False, \'upscale\': False, \'casual\': True}", \'GoodForMeal\': "{\'dessert\': False, \'latenight\': False, \'lunch\': True, \'dinner\': True, \'brunch\': False, \'breakfast\': False}", \'DogsAllowed\': \'False\', \'RestaurantsDelivery\': \'True\'}',
6: "{'AcceptsInsurance': 'True', 'BusinessAcceptsCreditCards': 'True', 'ByAppointmentOnly': 'True'}",
7: nan,
8: '{\'RestaurantsPriceRange2\': \'1\', \'ByAppointmentOnly\': \'False\', \'BusinessParking\': "{\'garage\': False, \'street\': False, \'validated\': False, \'lot\': True, \'valet\': False}", \'BusinessAcceptsCreditCards\': \'True\', \'DogsAllowed\': \'True\', \'RestaurantsDelivery\': \'None\', \'BusinessAcceptsBitcoin\': \'False\', \'BikeParking\': \'True\', \'RestaurantsTakeOut\': \'None\', \'WheelchairAccessible\': \'True\'}'},
'categories': {0: 'Gastropubs, Food, Beer Gardens, Restaurants, Bars, American (Traditional), Beer Bar, Nightlife, Breweries',
1: 'Salad, Soup, Sandwiches, Delis, Restaurants, Cafes, Vegetarian',
2: 'Antiques, Fashion, Used, Vintage & Consignment, Shopping, Furniture Stores, Home & Garden',
3: 'Beauty & Spas, Hair Salons',
4: 'Gyms, Active Life, Interval Training Gyms, Fitness & Instruction',
5: 'Restaurants, Thai',
6: 'Dentists, Health & Medical, Orthodontists',
7: 'Breakfast & Brunch, Restaurants',
8: 'Shopping, Jewelry Repair, Appraisal Services, Local Services, Jewelry, Engraving, Gold Buyers'},
'hours': {0: "{'Monday': '11:0-23:0', 'Tuesday': '11:0-23:0', 'Wednesday': '11:0-23:0', 'Thursday': '11:0-23:0', 'Friday': '11:0-23:0', 'Saturday': '11:0-23:0', 'Sunday': '11:0-23:0'}",
1: "{'Monday': '5:0-18:0', 'Tuesday': '5:0-17:0', 'Wednesday': '5:0-18:0', 'Thursday': '5:0-18:0', 'Friday': '5:0-18:0', 'Saturday': '5:0-18:0', 'Sunday': '5:0-18:0'}",
2: "{'Thursday': '11:0-18:0', 'Friday': '11:0-18:0', 'Saturday': '11:0-18:0', 'Sunday': '11:0-18:0'}",
3: nan,
4: "{'Monday': '16:0-19:0', 'Tuesday': '16:0-19:0', 'Wednesday': '16:0-19:0', 'Thursday': '16:0-19:0', 'Friday': '16:0-19:0', 'Saturday': '9:0-11:0'}",
5: "{'Monday': '17:0-21:0', 'Tuesday': '17:0-21:0', 'Wednesday': '17:0-21:0', 'Thursday': '17:0-21:0', 'Friday': '17:0-21:0', 'Saturday': '17:0-21:0', 'Sunday': '17:0-21:0'}",
6: "{'Monday': '0:0-0:0', 'Tuesday': '8:0-17:30', 'Wednesday': '8:0-17:30', 'Thursday': '8:0-17:30', 'Friday': '8:0-17:30'}",
7: "{'Saturday': '8:0-14:0', 'Sunday': '8:0-14:0'}",
8: "{'Monday': '12:15-17:0', 'Tuesday': '12:15-17:0', 'Wednesday': '12:15-17:0', 'Thursday': '12:15-17:0', 'Friday': '12:15-17:0'}"}}
我想計算每個餐廳屬性中 True 和 False 出現的次數
您可以連接您列出的所有元素並搜索'\bTrue\b'
/ '\bFalse\b'
模式( \b
表示單詞邊界):
s = df['attributes'].fillna('').apply(''.join)
df['nb_True'] = s.str.count('\bTrue\b')
df['nb_False'] = s.str.count('\bFalse\b')
output:
>>> df[['nb_True', 'nb_False']]
nb_True nb_False
0 12 21
1 8 23
2 2 6
3 2 1
4 1 6
5 10 20
6 3 0
7 0 0
8 5 6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.