简体   繁体   English

如何删除一组字典中所有字典的字典中某个值后某个字符的所有内容?

[英]How do I remove everything after a certain character in a value in a dictionary for all dictionaries in a group of dictionaries?

My goal is to remove all characters after a certain character in a value from a set of dictionaries. 我的目标是从一组词典中删除某个值中某个字符之后的所有字符。

I have imported a CSV file from my local machine and printed using the following code: 我已经从本地计算机导入了CSV文件,并使用以下代码进行了打印:

import csv
with open('C:\Users\xxxxx\Desktop\Aug_raw_Page.csv') as csvfile:
    reader=csv.DictReader(csvfile)
    for row in reader:
        print row

I get a set of directories that look like: 我得到一组看起来像的目录:

{Pageviews_Aug':'145', 'URL':'http://www.domain.com/#fbid=12345'}

For any directory that includes a value with #fbid , I am trying to removing #fbid and any characters that come after that - for all directories where this is true. 对于包含#fbid值的任何目录,我尝试删除#fbid及其后的所有字符-对于所有为true的目录。

I have tried: 我努力了:

for key,value in row.items():
       if key == 'URL' and '#' in value or 'fbid' in value
            value.split('#')[0]
            print row

Didn't work. 没用

Don't think rsplit will work as it removes only whitespace. 不要认为rsplit会起作用,因为它只会删除空格。

Fastest way I thought about is using rsplit() 我想到的最快方法是使用rsplit()

out = text.rsplit('#fbid')[0]

Okay, so I'm guessing your problem isn't in removing the text that comes afer the # but in getting to that string. 好的,所以我想您的问题不在于删除#的文本,而在于删除该字符串。

What is 'row'? 什么是“行”? I'm guessing it's a dictionnary with a single 'URL' key, am I wrong? 我猜这是只有一个“ URL”键的字典,对吗?

 for key,value in row.items():
     if key == 'URL' and '#fbid' in value:
        print value.split('#')[0]

I don't quite get the whole format of your data. 我不太了解您的数据的整体格式。 If you want to edit a single variable in your dictionary, you don't have to iterate through all the items: 如果要在字典中编辑单个变量,则不必遍历所有项:

if 'URL' in row.keys():
    if '#fbid' in row['URL']:
         row['URL'] = row['URL'].rsplit('#fbid')[0]

That should work. 那应该工作。 But I really think you should copy an example of your whole data (three items would suffice) 但我真的认为您应该复制整个数据的示例(三个项目就足够了)

Use a regular expression: 使用正则表达式:

>>> import re
>>> value = 'http://www.domain.com/#fbid=12345'
>>> re.sub(ur'#fbid.*','',value)
'http://www.domain.com/'
>>> value = 'http://www.domain.com/'
>>> re.sub(ur'#fbid.*','',value)
'http://www.domain.com/'

for your code you could do something like this to get the answer in the same format as before: 对于您的代码,您可以执行以下操作以与以前相同的格式获取答案:

import csv
with open('C:\Users\xxxxx\Desktop\Aug_raw_Page.csv') as csvfile:
    reader=csv.DictReader(csvfile)
    for row in reader:
        row['URL'] = re.sub(ur'#fbid.*','',row['URL'])
        print row

given your sample code, it looks to you that don't work because you don't save the result of value.split('#')[0] , do something like 给定您的示例代码,它看起来不起作用,因为您没有保存value.split('#')[0] ,请执行以下操作

for key,value in row.items():
    if key == 'URL' and '#' in value or 'fbid' in value
        new_value = value.split('#')[0]  # <-- here save the result of split in new_value
        row[key] = new_value             # <-- here update the dict row
 print row                               # instead of print each time, print it once at the end of the operation

this can be simplify to 这可以简化为

if '#fbid' in row['URL']:
    row['URL'] = row['URL'].split('#fbid')[0]

because it only check for one key. 因为它只检查一把钥匙。

example

>>> row={'Pageviews_Aug':'145', 'URL':'http://www.domain.com/#fbid=12345'}
>>> if "#fbid" in row["URL"]:
        row["URL"] = row['URL'].split("#fbid")[0]


>>> row
{'Pageviews_Aug': '145', 'URL': 'http://www.domain.com/'}
>>> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM