简体   繁体   English

Python:从字符串列表中删除一部分字符串

[英]Python: Remove a portion of a string from a list of strings

I used xlrd to extract a column from an excel sheet to make into a list. 我使用xlrd从Excel工作表中提取一列以制成列表。

from xlrd import open_workbook
book = xlrd.open_workbook("HEENT.xlsx").sheet_by_index(0)
med_name = []
for row in sheet.col(2):
    med_name.append(row)
med_school = []
for row in sheet.col(3):
    med_school.append(row)
print(med_school)

Below is a snippet of the list: med_school. 下面是该列表的一个片段:med_school。

[text:'University of San Francisco', 
text: 'Harvard University', 
text:'Class of 2016, University of Maryland School of Medicine', 
text:'Class of 2015, Johns Hopkins University School of Medicine', 
text:'Class of 2014, Raymond and Ruth Perelman School of Medicine at the
University of Pennsylvania']

I want to remove "text:'Class of 2014" from each string in the list. 我想从列表中的每个字符串中删除“ text:'Class of 2014”。 I tried list comprehension, but I got an attribute error: 'Cell' object has no attribute 'strip'. 我尝试了列表理解,但是遇到了属性错误:“ Cell”对象没有属性“ strip”。 Does anyone know of a way to create a list of medical school names that have just the medical school names without the class year and the word "text"? 有谁知道一种创建医学院名称的列表的方法,这些名称仅包含医学院名称而没有上课年份和单词“ text”?

The xlrd does not return you strings, it returns you instances of a class called Cell . xlrd不返回您的字符串,而是返回您称为Cell的类的实例。 This has a property value that contains the string you are seeing. 该属性value包含您看到的字符串。

To modify these simply: 要简单地修改它们:

for cell in med_school:
    cell.value = cell.value[:15]

This will remove the first 15 characters ("Class of 2014, "). 这将删除前15个字符(“ 2014年班级”)。 Alternatively you could use other approaches like string splitting (on ",") or a regex. 另外,您可以使用其他方法,例如字符串分割(在“,”上)或正则表达式。

The point here is that you shouldn't be working directly on the values in the med_schools list, but on their .value property. 这里的重点是您不应该直接在med_schools列表中的值上工作,而应在它们的.value属性上工作。 Or extract it to somewhere else you could work on it. 或将其提取到其他可以使用的位置。

For example, to get all of the text properties, dropping the prefix: 例如,要获取所有文本属性,请删除前缀:

values = [cell.value[15:] for cell in med_schools]

Or using a regex to replace to replace only those actualling containing the offending data 或者使用正则表达式替换仅替换包含违规数据的那些

values = [re.sub(r"^Class of \d{4}, ", "", cell.value) for cell in med_schools]

Use the given separator to cut off the head of each string. 使用给定的分隔符切断每根弦的头部。 Check first to make sure it has "Class", so we know the comma-space is there. 首先检查以确保它具有“ Class”,因此我们知道逗号空间在那里。

med_school = ["text:'Class of 2016, University of Maryland School of Medicine'",  
              "text:'Class of 2015, Johns Hopkins University School of Medicine'", 
              "text:'Class of 2014, Raymond and Ruth Perelman School of Medicine at the University of Pennsylvania'",
              "text:'Class of 1989, Rush Medical School / Knox College'",
              "text:'Bernie\'s Back-Alley School of Black-Market Techniques'"
             ]

school_name = []
for first in med_school:
    name = first.value
    if ", " in name:
        cut  = name.index(", ")
        name = name[cut+2:]
    else:
        name = name[6:-1]
    school_name.append(name)

print school_name

output (with extra line feeds to improve readability): 输出(带有额外的换行符以提高可读性):

["University of Maryland School of Medicine'",
 "Johns Hopkins University School of Medicine'",
 "Raymond and Ruth Perelman School of Medicine at the University of Pennsylvania'"
 "Rush Medical School / Knox College'", 
 "Bernie's Back-Alley School of Black-Market Techniques"]

You could also wrap the loop into a list comprehension: 您还可以将循环包装为列表推导:

school_name = [name.value[name.value.index(", ")+2:] \
                       if ", " in name \
                       else name[6:-1]   \
                   for name in med_school]

Change for row in sheet.col(2) to for row in sheet.col(2).value . for row in sheet.col(2)更改for row in sheet.col(2) for row in sheet.col(2).value
U will get rid of the do file type and get the actual value. U将删除do文件类型并获取实际值。 Do this. 做这个。

results =[] for row in sheet.col(2).value: print(row) for row in sheet.col(2).value: print(row) results =[] for row in sheet.col(2).value: print(row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM