[英]How do I remove everything after a certain character in a value in a dictionary for all dictionaries in a group of dictionaries?
My goal is to remove all characters after a certain character in a value from a set of dictionaries. 我的目标是从一组词典中删除某个值中某个字符之后的所有字符。
I have imported a CSV file from my local machine and printed using the following code: 我已经从本地计算机导入了CSV文件,并使用以下代码进行了打印:
import csv
with open('C:\Users\xxxxx\Desktop\Aug_raw_Page.csv') as csvfile:
reader=csv.DictReader(csvfile)
for row in reader:
print row
I get a set of directories that look like: 我得到一组看起来像的目录:
{Pageviews_Aug':'145', 'URL':'http://www.domain.com/#fbid=12345'}
For any directory that includes a value with #fbid
, I am trying to removing #fbid
and any characters that come after that - for all directories where this is true. 对于包含
#fbid
值的任何目录,我尝试删除#fbid
及其后的所有字符-对于所有为true的目录。
I have tried: 我努力了:
for key,value in row.items():
if key == 'URL' and '#' in value or 'fbid' in value
value.split('#')[0]
print row
Didn't work. 没用
Don't think rsplit
will work as it removes only whitespace. 不要认为
rsplit
会起作用,因为它只会删除空格。
Fastest way I thought about is using rsplit()
我想到的最快方法是使用
rsplit()
out = text.rsplit('#fbid')[0]
Okay, so I'm guessing your problem isn't in removing the text that comes afer the #
but in getting to that string. 好的,所以我想您的问题不在于删除
#
的文本,而在于删除该字符串。
What is 'row'? 什么是“行”? I'm guessing it's a dictionnary with a single 'URL' key, am I wrong?
我猜这是只有一个“ URL”键的字典,对吗?
for key,value in row.items():
if key == 'URL' and '#fbid' in value:
print value.split('#')[0]
I don't quite get the whole format of your data. 我不太了解您的数据的整体格式。 If you want to edit a single variable in your dictionary, you don't have to iterate through all the items:
如果要在字典中编辑单个变量,则不必遍历所有项:
if 'URL' in row.keys():
if '#fbid' in row['URL']:
row['URL'] = row['URL'].rsplit('#fbid')[0]
That should work. 那应该工作。 But I really think you should copy an example of your whole data (three items would suffice)
但我真的认为您应该复制整个数据的示例(三个项目就足够了)
Use a regular expression: 使用正则表达式:
>>> import re
>>> value = 'http://www.domain.com/#fbid=12345'
>>> re.sub(ur'#fbid.*','',value)
'http://www.domain.com/'
>>> value = 'http://www.domain.com/'
>>> re.sub(ur'#fbid.*','',value)
'http://www.domain.com/'
for your code you could do something like this to get the answer in the same format as before: 对于您的代码,您可以执行以下操作以与以前相同的格式获取答案:
import csv
with open('C:\Users\xxxxx\Desktop\Aug_raw_Page.csv') as csvfile:
reader=csv.DictReader(csvfile)
for row in reader:
row['URL'] = re.sub(ur'#fbid.*','',row['URL'])
print row
given your sample code, it looks to you that don't work because you don't save the result of value.split('#')[0]
, do something like 给定您的示例代码,它看起来不起作用,因为您没有保存
value.split('#')[0]
,请执行以下操作
for key,value in row.items():
if key == 'URL' and '#' in value or 'fbid' in value
new_value = value.split('#')[0] # <-- here save the result of split in new_value
row[key] = new_value # <-- here update the dict row
print row # instead of print each time, print it once at the end of the operation
this can be simplify to 这可以简化为
if '#fbid' in row['URL']:
row['URL'] = row['URL'].split('#fbid')[0]
because it only check for one key. 因为它只检查一把钥匙。
example 例
>>> row={'Pageviews_Aug':'145', 'URL':'http://www.domain.com/#fbid=12345'}
>>> if "#fbid" in row["URL"]:
row["URL"] = row['URL'].split("#fbid")[0]
>>> row
{'Pageviews_Aug': '145', 'URL': 'http://www.domain.com/'}
>>>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.