简体   繁体   English

*.ics 使用多行拆分字符串的问题 *Python*

[英]issue with *.ics splitting strings with more than one line *Python*

I have tried as many methods I could find, and always got the same result, but there must be a fix for this?我尝试了尽可能多的方法,但总是得到相同的结果,但必须有解决办法吗?

I am downloading an ICS from a website, were one of the lines "Summary", is split in two.我正在从网站下载 ICS,其中一行“摘要”被分成两部分。 When I load this into a string these two lines get automaticly joined into 1 string, unless there are "\n".当我将它加载到一个字符串中时,这两行会自动连接成一个字符串,除非有“\n”。

so I have tried to replace both "\n" and "\r", but there is no change on my issue.所以我尝试同时替换“\n”和“\r”,但我的问题没有任何变化。

Code代码

from icalendar import Calendar, Event
from datetime import datetime
import icalendar
import urllib.request
import re
from clear import clear_screen

cal = Calendar()

def download_ics():
    url = "https://www.pogdesign.co.uk/cat/download_ics/7d903a054695a48977d46683f29384de"
    file_name = "pogdesign.ics"
    urllib.request.urlretrieve(url, file_name)

def get_start_time(time):
    time = datetime.strftime(time, "%A - %H:%M")
    return time

def get_time(time):
    time = datetime.strftime(time, "%H:%M")
    return time

def check_Summary(text):
    #newline = re.sub('[\r\n]', '', text)
    newline = text.translate(str.maketrans("", "", "\r\n"))
    return newline

def main():
    download_ics()
    clear_screen()
    e = open('pogdesign.ics', 'rb')
    ecal = icalendar.Calendar.from_ical(e.read())
    for component in ecal.walk():
        if component.name == "VEVENT":
            summary = check_Summary(component.get("SUMMARY"))
            print(summary)
            print("\t Start : " + get_start_time(component.decoded("DTSTART")) + " - " + get_time(component.decoded("DTEND")))

            print()
    e.close()

if __name__ == "__main__":
    main()

output output

Young Sheldon S06E11 - Ruthless, Toothless, and a Week ofBed Rest Start: Friday - 02:00 - 02:30年轻的谢尔顿 S06E11 - 无情、无牙和一周的床上 Rest 开始:星期五 - 02:00 - 02:30

The Good Doctor S06E11 - The Good Boy Start: Tuesday - 04:00 - 05:00好医生 S06E11 - 好男孩开始:星期二 - 04:00 - 05:00

National Treasure: Edge of History S01E08 - Family Tree Start: Thursday - 05:59 - 06:59国家宝藏:历史边缘 S01E08 - 家谱开始:星期四 - 05:59 - 06:59

National Treasure: Edge of History S01E09 - A Meeting withSalazar Start: Thursday - 05:59 - 06:59国家宝藏:历史边缘 S01E09 - 与萨拉查的会面开始:星期四 - 05:59 - 06:59

The Last of Us S01E03 - Long Long Time Start: Monday - 03:00 - 04:00最后生还者 S01E03 - 很长很长的时间 开始时间:星期一 - 03:00 - 04:00

The Last of Us S01E04 - Please Hold My Hand Start: Monday - 03:00 - 04:00最后生还者 S01E04 - 请握住我的手开始时间:周一 - 03:00 - 04:00

Anne Rice's Mayfair Witches S01E04 - Curiouser and Curiouser Start: Monday - 03:00 - 04:00安妮赖斯的梅菲尔女巫 S01E04 - Curiouser and Curiouser 开始:星期一 - 03:00 - 04:00

Anne Rice's Mayfair Witches S01E05 - The Thrall Start: Monday - 03:00 - 04:00安妮赖斯的梅菲尔女巫 S01E05 - 奴隶开始:星期一 - 03:00 - 04:00

The Ark S01E01 - Everyone Wanted to Be on This Ship Start: Thursday - 04:00 - 05:00方舟 S01E01 - 每个人都想登上这艘船 开始时间:星期四 - 04:00 - 05:00

I have looked through all kinds of solutions, like converting the text to "utf-8" and "ISO-8859-8".我查看了各种解决方案,比如将文本转换为“utf-8”和“ISO-8859-8”。 I have tried some functions I found in the icalendar.我尝试了在 icalendar 中找到的一些功能。 have even asked ChatGPT for help.甚至向 ChatGPT 寻求帮助。

as you might see on the first line on the output: Young Sheldon S06E11 - Ruthless, Toothless, and a Week ofBed Rest and National Treasure: Edge of History S01E09 - A Meeting withSalazar正如您在 output 的第一行中看到的那样:年轻的谢尔顿 S06E11 - 冷酷无情和卧床一周Rest和国家宝藏:历史边缘S01E09 - 与萨拉查的会面

These two lines in the downloaded ics, is on two seperate lines, and i cannot manage to make them split, or not join at all...下载的 ics 中的这两行位于两条单独的行上,我无法设法让它们分开,或者根本不加入......

So far as the icalendar.Calendar class is concerned, that ical is incorrectly formatted.icalendar.Calendar class 而言,该 ical 格式不正确。

icalendar.Calendar.from_ical() calls icalendar.Calendar.parser.Contentlines.from_ical() which is icalendar.Calendar.from_ical()调用icalendar.Calendar.parser.Contentlines.from_ical() 这是

    def from_ical(cls, ical, strict=False):
        """Unfold the content lines in an iCalendar into long content lines.
        """
        ical = to_unicode(ical)
        # a fold is carriage return followed by either a space or a tab
        return cls(uFOLD.sub('', ical), strict=strict)

where uFOLD is re.compile('(\r?\n)+[ \t]')其中uFOLDre.compile('(\r?\n)+[ \t]')

That means it's removing each series of newlines that is followed by one space or tab character – not replacing it with a space.这意味着它会删除后跟一个空格或制表符的每一系列换行符——而不是用空格替换它。 The ical file you're retrieving has eg您正在检索的 ical 文件有例如

SUMMARY:Young Sheldon S06E11 - \\nRuthless\\, Toothless\\, and a Week of\r\n Bed Rest\r\n

so when of\r\n Bed is matched it becomes ofBed .所以当of\r\n Bed匹配时,它变成ofBed

This line-folding format is defined in RFC 2445 which gives the example此行折叠格式在 RFC 2445 中定义,其中给出了示例

For example the line:例如行:

 DESCRIPTION:This is a long description that exists on a long line.

Can be represented as:可以表示为:

 DESCRIPTION:This is a lo ng description that exists on a long line.

which makes clear that the implementation in from_ical() is correct.这清楚地表明from_ical()中的实现是正确的。

If you're quite sure that the source ical will always fold lines on words, you could adjust for that by adding a space after each line fold, like:如果您非常确定源代码总是会折叠单词上的行,则可以通过在每行折叠后添加一个空格来进行调整,例如:

    ecal = icalendar.Calendar.from_ical(e.read().replace(b'\r\n ', b'\r\n  '))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM