Python：从字符串列表中删除一部分字符串

Question

I used xlrd to extract a column from an excel sheet to make into a list. 我使用xlrd从Excel工作表中提取一列以制成列表。

from xlrd import open_workbook
book = xlrd.open_workbook("HEENT.xlsx").sheet_by_index(0)
med_name = []
for row in sheet.col(2):
    med_name.append(row)
med_school = []
for row in sheet.col(3):
    med_school.append(row)
print(med_school)

Below is a snippet of the list: med_school. 下面是该列表的一个片段：med_school。

[text:'University of San Francisco', 
text: 'Harvard University', 
text:'Class of 2016, University of Maryland School of Medicine', 
text:'Class of 2015, Johns Hopkins University School of Medicine', 
text:'Class of 2014, Raymond and Ruth Perelman School of Medicine at the
University of Pennsylvania']

I want to remove "text:'Class of 2014" from each string in the list. 我想从列表中的每个字符串中删除“ text：'Class of 2014”。 I tried list comprehension, but I got an attribute error: 'Cell' object has no attribute 'strip'. 我尝试了列表理解，但是遇到了属性错误：“ Cell”对象没有属性“ strip”。 Does anyone know of a way to create a list of medical school names that have just the medical school names without the class year and the word "text"? 有谁知道一种创建医学院名称的列表的方法，这些名称仅包含医学院名称而没有上课年份和单词“ text”？

Answer 1

The xlrd does not return you strings, it returns you instances of a class called Cell . xlrd不返回您的字符串，而是返回您称为Cell的类的实例。 This has a property value that contains the string you are seeing. 该属性value包含您看到的字符串。

To modify these simply: 要简单地修改它们：

for cell in med_school:
    cell.value = cell.value[:15]

This will remove the first 15 characters ("Class of 2014, "). 这将删除前15个字符（“ 2014年班级”）。 Alternatively you could use other approaches like string splitting (on ",") or a regex. 另外，您可以使用其他方法，例如字符串分割（在“，”上）或正则表达式。

The point here is that you shouldn't be working directly on the values in the med_schools list, but on their .value property. 这里的重点是您不应该直接在med_schools列表中的值上工作，而应在它们的.value属性上工作。 Or extract it to somewhere else you could work on it. 或将其提取到其他可以使用的位置。

For example, to get all of the text properties, dropping the prefix: 例如，要获取所有文本属性，请删除前缀：

values = [cell.value[15:] for cell in med_schools]

Or using a regex to replace to replace only those actualling containing the offending data 或者使用正则表达式替换仅替换包含违规数据的那些

values = [re.sub(r"^Class of \d{4}, ", "", cell.value) for cell in med_schools]

Answer 2

Use the given separator to cut off the head of each string. 使用给定的分隔符切断每根弦的头部。 Check first to make sure it has "Class", so we know the comma-space is there. 首先检查以确保它具有“ Class”，因此我们知道逗号空间在那里。

med_school = ["text:'Class of 2016, University of Maryland School of Medicine'",  
              "text:'Class of 2015, Johns Hopkins University School of Medicine'", 
              "text:'Class of 2014, Raymond and Ruth Perelman School of Medicine at the University of Pennsylvania'",
              "text:'Class of 1989, Rush Medical School / Knox College'",
              "text:'Bernie\'s Back-Alley School of Black-Market Techniques'"
             ]

school_name = []
for first in med_school:
    name = first.value
    if ", " in name:
        cut  = name.index(", ")
        name = name[cut+2:]
    else:
        name = name[6:-1]
    school_name.append(name)

print school_name

output (with extra line feeds to improve readability): 输出（带有额外的换行符以提高可读性）：

["University of Maryland School of Medicine'",
 "Johns Hopkins University School of Medicine'",
 "Raymond and Ruth Perelman School of Medicine at the University of Pennsylvania'"
 "Rush Medical School / Knox College'", 
 "Bernie's Back-Alley School of Black-Market Techniques"]

You could also wrap the loop into a list comprehension: 您还可以将循环包装为列表推导：

school_name = [name.value[name.value.index(", ")+2:] \
                       if ", " in name \
                       else name[6:-1]   \
                   for name in med_school]

Answer 3

Change for row in sheet.col(2) to for row in sheet.col(2).value . 将for row in sheet.col(2)更改for row in sheet.col(2) for row in sheet.col(2).value 。
U will get rid of the do file type and get the actual value. U将删除do文件类型并获取实际值。 Do this. 做这个。

results =[] for row in sheet.col(2).value: print(row) for row in sheet.col(2).value: print(row) results =[] for row in sheet.col(2).value: print(row)

Python：从字符串列表中删除一部分字符串

问题描述

3 个解决方案

解决方案1
4 2016-10-10 22:02:40

解决方案2
1 2016-10-10 22:00:46

解决方案3
1 2018-01-10 13:44:30

Python：从字符串列表中删除一部分字符串

问题描述

3 个解决方案

解决方案1 4 2016-10-10 22:02:40

解决方案2 1 2016-10-10 22:00:46

解决方案3 1 2018-01-10 13:44:30

解决方案1
4 2016-10-10 22:02:40

解决方案2
1 2016-10-10 22:00:46

解决方案3
1 2018-01-10 13:44:30